Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a notes field for each entry #23

Open
nuwang opened this issue May 4, 2023 · 1 comment
Open

Introduce a notes field for each entry #23

nuwang opened this issue May 4, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@nuwang
Copy link
Member

nuwang commented May 4, 2023

Just putting this out there for discussion, but I'm thinking that perhaps it would be really useful to have the basis on which the cores, mem, gpu values in the shared database were determined, whenever possible. For example, the basis could be something like: "Tested on an input size of 100GB and did not require more than 3GB of ram. Does not scale with increased CPUs". Just having some notes would allow an admin to determine whether the resource allocation is a potential problem. For example, perhaps a newer version of that tool does support multiple cores. Right now, we have no way to know whether some values are just guesses, or hard constraints (e.g. Not a multi-core tool).

Another thing we could do is to query the usegalaxy.* databases, and find the largest input sizes that the tool has successfully run against, actual max ram and cores used, and maybe include those in the notes for reference?

The notes could also be a more formalized structure like:

resources:
 -  date: 4th May 2023
    max_input_tested: 100GB
    max_cores_logged: 1
    max_ram_logged: 24GB
    source: usegalaxy.eu
    notes: tool is not a multicore tool as of v0.1

Would this be useful or mostly just noise? Alternatively, it could live in a separate file.

@nuwang nuwang added the enhancement New feature or request label May 4, 2023
@afgane
Copy link

afgane commented May 4, 2023

I like the idea, and certainly feel keeping this info in the same file is a better option. However, am a bit skeptical that a short sentence would be adequate often because there there are often many parameters that impact this decision (eg, a tool may use 3GB or memory when run against a human reference but 1GB memory when run with the fly genome, and that's largely irrespective of the dataset input size). Perhaps doing this for a sample of tools to test feasibility? Particularly the tools that require large resources, or the ones we know won't run with less than specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants