You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just putting this out there for discussion, but I'm thinking that perhaps it would be really useful to have the basis on which the cores, mem, gpu values in the shared database were determined, whenever possible. For example, the basis could be something like: "Tested on an input size of 100GB and did not require more than 3GB of ram. Does not scale with increased CPUs". Just having some notes would allow an admin to determine whether the resource allocation is a potential problem. For example, perhaps a newer version of that tool does support multiple cores. Right now, we have no way to know whether some values are just guesses, or hard constraints (e.g. Not a multi-core tool).
Another thing we could do is to query the usegalaxy.* databases, and find the largest input sizes that the tool has successfully run against, actual max ram and cores used, and maybe include those in the notes for reference?
The notes could also be a more formalized structure like:
resources:
- date: 4th May 2023max_input_tested: 100GBmax_cores_logged: 1max_ram_logged: 24GBsource: usegalaxy.eunotes: tool is not a multicore tool as of v0.1
Would this be useful or mostly just noise? Alternatively, it could live in a separate file.
The text was updated successfully, but these errors were encountered:
I like the idea, and certainly feel keeping this info in the same file is a better option. However, am a bit skeptical that a short sentence would be adequate often because there there are often many parameters that impact this decision (eg, a tool may use 3GB or memory when run against a human reference but 1GB memory when run with the fly genome, and that's largely irrespective of the dataset input size). Perhaps doing this for a sample of tools to test feasibility? Particularly the tools that require large resources, or the ones we know won't run with less than specified.
Just putting this out there for discussion, but I'm thinking that perhaps it would be really useful to have the basis on which the
cores
,mem
,gpu
values in the shared database were determined, whenever possible. For example, the basis could be something like: "Tested on an input size of 100GB and did not require more than 3GB of ram. Does not scale with increased CPUs". Just having some notes would allow an admin to determine whether the resource allocation is a potential problem. For example, perhaps a newer version of that tool does support multiple cores. Right now, we have no way to know whether some values are just guesses, or hard constraints (e.g. Not a multi-core tool).Another thing we could do is to query the usegalaxy.* databases, and find the largest input sizes that the tool has successfully run against, actual max ram and cores used, and maybe include those in the notes for reference?
The notes could also be a more formalized structure like:
Would this be useful or mostly just noise? Alternatively, it could live in a separate file.
The text was updated successfully, but these errors were encountered: