Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genomes page #12

Open
3 tasks
Smeds opened this issue Jan 29, 2025 · 0 comments
Open
3 tasks

Genomes page #12

Smeds opened this issue Jan 29, 2025 · 0 comments
Assignees

Comments

@Smeds
Copy link
Collaborator

Smeds commented Jan 29, 2025

This page should list all available assemblies for one ore multiple selected species.

Column Name json key Always Visibly Visible by default Could be empty Type Description
Y Y checkbox N option used to select one or more assemblies/genomes
Y Y Action Buttons N buttons making it possible to perform different tasks, configurable
Assembly ID accession Y Y N String Genus and Species name
Name name Y Y N String Genus and Species name
Taxid taxon_id Y Y N Number Taxonomic identifier
Type assembly_type Y Y N String describe type of assembly
Release Data Y Y Y Date date for when the assembly was published
N50 n50 N Y Y Number n50 (scaffold) value
Scaffolds scaffolds N Y Y Number number of scaffolds

Suggestion of possible action buttons:

  1. Analyze --> forward to a page where one workflow can be selected to be run on galaxy
  2. UCSC = Link to the UCSC Genome Browser
  3. NCBI = Link to NCBI Datasets
  4. EBI = Link to EBI

Some thoughts:

  1. For the first column, it could also be an option to instead have it possible to select multiple rows in the table by clicking on a row.
  2. The action buttons should be configurable. We could maybe have entries on the data structure for each assembly, like {'name': 'NCBI, 'url': 'address to open'}
  3. When selecting multiple assemblies/genomes the user should have the option to enter a comparative page. The question is how we provide this option. A few possibilities are:
    1. The user click on one of the selected assemblies/genomes Analyze button
    2. The Analyze buttons of the selected assemblies/genome have there name transformed to Comparative analyze to make it easier to see that a comparative page will be entered.
    3. A Comparative Analyze button pops up somewhere on the page

Assembly information can be retrieved from /genome/accession/{accessions}/dataset_report

{
  "reports": [
    {
      "accession": "GCF_008822105.2",
      "assembly_info": { 
         "release_date": "2020-02-12",
         "assembly_type": "haploid",
      },
      "assembly_stats": {
        "total_number_of_chromosomes": 32,
        "total_sequence_length": "1068971253",
        "total_ungapped_length": "1047402540",
        "number_of_contigs": 1052,
        "contig_n50": 4378277,
        "contig_l50": 63,
        "number_of_scaffolds": 204,
        "scaffold_n50": 70879221,
        "scaffold_l50": 6,
        "number_of_component_sequences": 204,
        "genome_coverage": "82.5x",
        "number_of_organelles": 1
      },
    }
  ]
}

have divided the implementation into three phases:

  • Initial Setup – The goal of this phase is to get a basic version up and running. This will make it possible to implement the analysis and comparative analysis page in the next step.
  • Enhancements using NCBI API – Once we have a simple page working, we will expand it by incorporating additional information from the NCBI REST API.
  • Additional Data Integration – In this final phase, we will add assemblies found on VGP S3 bucket.

Initial Setup (Minimum feature)

The following columns should be included:

  1. Assembly ID
  2. Name
  3. Type (would be nice but could be moved to next phase)

And make it possible to select one or multiple assemblies/genomes and then move to the analyze pages.

Enhancements using NCBI API

Add the remaining columns to the table including action buttons.

In these step we need to extend the data extraction script to fetch more information from the NCBI API

Additional Data Integration

We want to make it possible to also select assemblies/genomes stored on the VGP S3 bucket, preferably it should be have like this:

  1. show only NCBI available assemblies/genomes if they exist.
  2. if no NCBI assemblies/genomes exist show the VGP (S3 bucket) assemblies
  3. If both NCBI and VGP (S3 bucket) assemblies/genomes exists:
    1. show the NCBI be default
    2. make it possible to also show VGP (S3 bucket) by for example click on a button.

The script for generating a data blob for VGP data could be a new one that outputs a JSON/YAML file. I don't have a strong preference on whether we create a separate script to merge VGP data with NCBI or modify the existing NCBI data script to take the new file as input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants