-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update bioboxes image list #4
Comments
**Problem:** One tool/image could implement multiple interfaces. By using the
current list we would have to list the an image for each interface.
My preference is for each biobox is to implement one type of interface to
simplify the management of biobox images, especially the image tasks. Having
one interface per biobox image means that each of the task is scoped by the
interface too. For example the `default` task for assembler should imply the
what the author belives is the best possible assembly given a wide variety of
inputs, which is what the CLI runs if `--task` is not specified. A careful
task implies trading assembly size for accuracy.
Were we to have difference interfaces in the same biobox, it would be hard to
determine what the default task would relate to. There could also not be two
default tasks for each interface unless they were namespaced in some way.
2. @michaelbarton suggested to add SHA ids of the bioboxes images.
3. I suggest to add a field called 'tags'. This field should allow us to add categories for each biobox.
I have been using SHA256 digests so far because this is supported
by the docker client. For example the command:
docker run repo/image@sha256:digest
I prefer this approach because it allows different images of the same name to
be used, differentiating them by their digest. This is what I am using in
nucleotid.es at the JGI because it allows us, for example, to benchmark
spades v3.9 vs. v3.10. I think it would also help reproducible research where
the ideal case would be bioinformaticians listing the exact digest of the
image they used in their methods.
I also prefer this over using docker tags because docker tags can be removed
and changed in the docker repo so there is no guarantee that a tag will always
point to the same build. A digest on the other hand is explicitly tied to a
build. I think the tags you mentioned refer to a different use case though, is
that correct?
4. @fernandomeyer added the velour biobox but our
[page](http://bioboxes.org/available-bioboxes/) on bioboxes.org is not
updated. Maybe we should implement this listing by using javascript which
allows us to fetch the list each time the webpage is opened.
I believe we could update the circle.yml for the data repository to
automatically request a rebuild of the website every time a pull request is
merged into master.
|
I agree this would be difficult if we want to specify a default task for each interface.
Ok, so using or listing the digest does makes sense if you are referencing a specific biobox from a different service, like nucleotid.es or CAMI.
Yes, with tags I do not mean docker tags. I think it would be useful in our current bioboxes listing (http://bioboxes.org/available-bioboxes/) to have a field called 'tags' or 'metatags'. In this field we could categorize our containers. Tags could be for example 'CAMI' or 'nucleotid.es'.
Sounds great! Could you update the repo? |
> My preference is for each biobox is to implement one type of interface to
> simplify the management of biobox images, especially the image tasks.
> Having one interface per biobox image means that each of the task is
> scoped by the interface too. For example the `default` task for assembler
> should imply the what the author belives is the best possible assembly
> given a wide variety of inputs, which is what the CLI runs if `--task` is
> not specified. A careful task implies trading assembly size for accuracy.
I agree this would be difficult if we want to specify a default task for
each interface. In CAMI we have for example binning evaluation tools that
could be used taxonomic and non taxonomic binning files. Thats why they
implement the taxonomic and non-taxonomic binning evaluation interface.
I think this is not ideal because it means maintaining two versions of
essentially the same image. It does seem to me to be the best balance
maintainability for us, and simplicity in the user interface. If we can think
of a way to keep the interface as simple as possible for the users, or even
simplify it further, I would be interested in exploring this.
I guess we will have to build two different images that are fetching the
same library/github repository.
As a work around, we could build a common base image of the tool and create
the separate biobox images on top. I think a goal for bioboxes would be to ask
developers and authors maintain the Docker images and we'll maintain the
interfaces. That's wishful thinking for the time being.
> I have been using SHA256 digests so far because this is supported by the
> docker client. For example the command: docker run
> ***@***.***:digest
Ok, so using or listing the digest does makes sense if you are referencing a
specific biobox from a different service, like nucleotid.es or CAMI.
I think we might be talking at cross purposes here. I'm currently using
bioboxes images in nucleotides and CAMI is as well. Using the digest would
allow us to exactly specify which image was benchmarked, and that could be the
same image in both CAMI and nucleotides. I think if CAMI and nucleotides are
generating metrics for the same biobox image, that very good because this
helps standardise the benchmarking process between groups.
> I think the tags you mentioned refer to a different use case though, is
> that correct?
Yes, with tags I do not mean docker tags. I think it would be useful in our
current bioboxes listing (http://bioboxes.org/available-bioboxes/) to have a
field called 'tags' or 'metatags'. In this field we could categorize our
containers. Tags could be for example 'CAMI' or 'nucleotid.es'.
I'm not sure what the use case would be. If it would be to link to the
benchmarked data, then I think that would be great idea. For example if we
could like to all benchmarking data that's available for the specific image.
> I believe we could update the circle.yml for the data repository to
> automatically request a rebuild of the website every time a pull request
> is merged into master.
Sounds great! Could you update the repo?
Yes, I'll look into setting this up.
|
Yes, that is something I would like to implement in future. But for now I think it would be enough to add a tag with a link to the benchmarking website. I could setup a PR. For showing benchmarking results for bioinformatics software we would have to define something like a common REST API that should be used by nucleotid.es and CAMI and maybe other evaluation/benchmarking websites. Other websites that are listing bioinformatics software could use this API. But this is something that is independent of bioboxes. If you are interested in working on such an API with me we should discuss this somewhere else.
Great. I will create a separate issue, so that we do not forget. |
I suggest the following changes:
Problem: One tool/image could implement multiple interfaces. By using the current list we would have to list the an image for each interface.
Solution: in PR #2 I suggest to change the list to the following format where we state for each task the corresponding interface:
Interfaces are listed in a separate file: interfaces.yml
@michaelbarton suggested to add SHA ids of the bioboxes images.
I suggest to add a field called 'tags'. This field should allow us to add categories for each biobox.
Example:
The text was updated successfully, but these errors were encountered: