Update bioboxes image list #4

pbelmann · 2016-12-07T09:32:47Z

I suggest the following changes:

Problem: One tool/image could implement multiple interfaces. By using the current list we would have to list the an image for each interface.

Solution: in PR #2 I suggest to change the list to the following format where we state for each task the corresponding interface:

    title: velvet
    image:
      dockerhub: bioboxes/velvet
      repo: https://github.com/bioboxes/velvet
      source: https://github.com/dzerbino/velvet
    pmid: 18349386
    homepage: https://www.ebi.ac.uk/~zerbino/velvet/
    mailing_list: http://listserver.ebi.ac.uk/mailman/listinfo/velvet-users
    description:
      The velvet assembler was one of the first assemblers created for short read sequencing. Velvet was developed at the European Bioinformatics Institute.
    tasks:
      - name: default
        interface: assembler
      - name: careful
        interface: assembler

Interfaces are listed in a separate file: interfaces.yml

@michaelbarton suggested to add SHA ids of the bioboxes images.
I suggest to add a field called 'tags'. This field should allow us to add categories for each biobox.

Example:

    title: ray
    image:
      dockerhub: bioboxes/ray
      repo: https://github.com/sebhtml/ray
      source:
    pmid: 20958248
    homepage: http://gatb.inria.fr/
    tags: [nucleotid.es, CAMI]
    mailing_list: https://www.biostars.org/t/gatb/
    description: 
      Ray is a parallel software that computes de novo genome assemblies with next-generation sequencing data.
    tasks:
      - name: default
        interface: assembler

@fernandomeyer added the velour biobox but our page on bioboxes.org is not updated. Maybe we should implement this listing by using javascript which allows us to fetch the list each time the webpage is opened.

michaelbarton · 2016-12-07T20:51:34Z

**Problem:** One tool/image could implement multiple interfaces. By using the current list we would have to list the an image for each interface.

My preference is for each biobox is to implement one type of interface to simplify the management of biobox images, especially the image tasks. Having one interface per biobox image means that each of the task is scoped by the interface too. For example the `default` task for assembler should imply the what the author belives is the best possible assembly given a wide variety of inputs, which is what the CLI runs if `--task` is not specified. A careful task implies trading assembly size for accuracy. Were we to have difference interfaces in the same biobox, it would be hard to determine what the default task would relate to. There could also not be two default tasks for each interface unless they were namespaced in some way.

2. @michaelbarton suggested to add SHA ids of the bioboxes images. 3. I suggest to add a field called 'tags'. This field should allow us to add categories for each biobox.

I have been using SHA256 digests so far because this is supported by the docker client. For example the command: docker run repo/image@sha256:digest I prefer this approach because it allows different images of the same name to be used, differentiating them by their digest. This is what I am using in nucleotid.es at the JGI because it allows us, for example, to benchmark spades v3.9 vs. v3.10. I think it would also help reproducible research where the ideal case would be bioinformaticians listing the exact digest of the image they used in their methods. I also prefer this over using docker tags because docker tags can be removed and changed in the docker repo so there is no guarantee that a tag will always point to the same build. A digest on the other hand is explicitly tied to a build. I think the tags you mentioned refer to a different use case though, is that correct?

4. @fernandomeyer added the velour biobox but our [page](http://bioboxes.org/available-bioboxes/) on bioboxes.org is not updated. Maybe we should implement this listing by using javascript which allows us to fetch the list each time the webpage is opened.

I believe we could update the circle.yml for the data repository to automatically request a rebuild of the website every time a pull request is merged into master.

pbelmann · 2016-12-08T11:48:33Z

Problem: One tool/image could implement multiple interfaces. By using the
current list we would have to list the an image for each interface.

My preference is for each biobox is to implement one type of interface to
simplify the management of biobox images, especially the image tasks. Having
one interface per biobox image means that each of the task is scoped by the
interface too. For example the default task for assembler should imply the
what the author belives is the best possible assembly given a wide variety of
inputs, which is what the CLI runs if --task is not specified. A careful
task implies trading assembly size for accuracy.

I agree this would be difficult if we want to specify a default task for each interface.
In CAMI we have for example binning evaluation tools that could be used taxonomic and non taxonomic binning files. Thats why they implement the taxonomic and non-taxonomic binning evaluation interface.
But I guess we will have to build two different images that are fetching the same library/github repository.

I have been using SHA256 digests so far because this is supported
by the docker client. For example the command:
docker run repo/image@sha256:digest

Ok, so using or listing the digest does makes sense if you are referencing a specific biobox from a different service, like nucleotid.es or CAMI.

I think the tags you mentioned refer to a different use case though, is
that correct?

Yes, with tags I do not mean docker tags. I think it would be useful in our current bioboxes listing (http://bioboxes.org/available-bioboxes/) to have a field called 'tags' or 'metatags'. In this field we could categorize our containers. Tags could be for example 'CAMI' or 'nucleotid.es'.

@fernandomeyer added the velour biobox but our
page on bioboxes.org is not
updated. Maybe we should implement this listing by using javascript which
allows us to fetch the list each time the webpage is opened.

I believe we could update the circle.yml for the data repository to
automatically request a rebuild of the website every time a pull request is
merged into master.

Sounds great! Could you update the repo?

michaelbarton · 2016-12-09T00:37:33Z

> My preference is for each biobox is to implement one type of interface to > simplify the management of biobox images, especially the image tasks. > Having one interface per biobox image means that each of the task is > scoped by the interface too. For example the `default` task for assembler > should imply the what the author belives is the best possible assembly > given a wide variety of inputs, which is what the CLI runs if `--task` is > not specified. A careful task implies trading assembly size for accuracy. I agree this would be difficult if we want to specify a default task for each interface. In CAMI we have for example binning evaluation tools that could be used taxonomic and non taxonomic binning files. Thats why they implement the taxonomic and non-taxonomic binning evaluation interface.

I think this is not ideal because it means maintaining two versions of essentially the same image. It does seem to me to be the best balance maintainability for us, and simplicity in the user interface. If we can think of a way to keep the interface as simple as possible for the users, or even simplify it further, I would be interested in exploring this.

I guess we will have to build two different images that are fetching the same library/github repository.

As a work around, we could build a common base image of the tool and create the separate biobox images on top. I think a goal for bioboxes would be to ask developers and authors maintain the Docker images and we'll maintain the interfaces. That's wishful thinking for the time being.

> I have been using SHA256 digests so far because this is supported by the > docker client. For example the command: docker run > ***@***.***:digest Ok, so using or listing the digest does makes sense if you are referencing a specific biobox from a different service, like nucleotid.es or CAMI.

I think we might be talking at cross purposes here. I'm currently using bioboxes images in nucleotides and CAMI is as well. Using the digest would allow us to exactly specify which image was benchmarked, and that could be the same image in both CAMI and nucleotides. I think if CAMI and nucleotides are generating metrics for the same biobox image, that very good because this helps standardise the benchmarking process between groups.

> I think the tags you mentioned refer to a different use case though, is > that correct? Yes, with tags I do not mean docker tags. I think it would be useful in our current bioboxes listing (http://bioboxes.org/available-bioboxes/) to have a field called 'tags' or 'metatags'. In this field we could categorize our containers. Tags could be for example 'CAMI' or 'nucleotid.es'.

I'm not sure what the use case would be. If it would be to link to the benchmarked data, then I think that would be great idea. For example if we could like to all benchmarking data that's available for the specific image.

> I believe we could update the circle.yml for the data repository to > automatically request a rebuild of the website every time a pull request > is merged into master. Sounds great! Could you update the repo?

Yes, I'll look into setting this up.

pbelmann · 2017-01-10T12:08:33Z

I think the tags you mentioned refer to a different use case though, is
that correct?

Yes, with tags I do not mean docker tags. I think it would be useful in our
current bioboxes listing (http://bioboxes.org/available-bioboxes/) to have a
field called 'tags' or 'metatags'. In this field we could categorize our
containers. Tags could be for example 'CAMI' or 'nucleotid.es'.

I'm not sure what the use case would be. If it would be to link to the
benchmarked data, then I think that would be great idea. For example if we
could like to all benchmarking data that's available for the specific image.

Yes, that is something I would like to implement in future. But for now I think it would be enough to add a tag with a link to the benchmarking website. I could setup a PR.

For showing benchmarking results for bioinformatics software we would have to define something like a common REST API that should be used by nucleotid.es and CAMI and maybe other evaluation/benchmarking websites. Other websites that are listing bioinformatics software could use this API. But this is something that is independent of bioboxes. If you are interested in working on such an API with me we should discuss this somewhere else.

I believe we could update the circle.yml for the data repository to
automatically request a rebuild of the website every time a pull request
is merged into master.

Sounds great! Could you update the repo?
Yes, I'll look into setting this up.

Great. I will create a separate issue, so that we do not forget.

pbelmann mentioned this issue Jan 10, 2017

List of bioboxes on bioboxes.org #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update bioboxes image list #4

Update bioboxes image list #4

pbelmann commented Dec 7, 2016

michaelbarton commented Dec 7, 2016 via email

pbelmann commented Dec 8, 2016

michaelbarton commented Dec 9, 2016 via email

pbelmann commented Jan 10, 2017

Update bioboxes image list #4

Update bioboxes image list #4

Comments

pbelmann commented Dec 7, 2016

michaelbarton commented Dec 7, 2016 via email

pbelmann commented Dec 8, 2016

michaelbarton commented Dec 9, 2016 via email

pbelmann commented Jan 10, 2017