Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/bmeg #9

Open
wants to merge 5 commits into
base: feature/fhir-aggregator
Choose a base branch
from
Open

feature/bmeg #9

wants to merge 5 commits into from

Conversation

bwalsh
Copy link
Contributor

@bwalsh bwalsh commented Feb 20, 2025

This PR:

  • adds bmeg extract config & tests
  • still a WIP
  • need to add schema for yaml files - see tests/fixtures/bmeg.yaml
  • need to test with TES - pending py-tes update
  • other? (comments appreciated)

dags/bmeg_pipeline.py will produce this:

image

tags are used to narrow the scope of the admin UI:
image

known issues:

For multi step DAGs e.g. ccle extract - the dag generator is inconsistent on when the dataset is updated. Need to discuss - should we enforce that the complexity of this type of pipeline be hidden from the DAG? i.e should we enforce that the extract author provide a single command?

image
# config snippet
- id: ccle
  extract_commands:
  - curl -L -o source/ccle/CCLE_depMap_19Q1_TPM_transcripts.csv https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fdepmap-rnaseq-expression-data-ccd0.12%2FCCLE_depMap_19Q1_TPM_transcripts.csv
  - python3 transform/ccle/cellline_lookups.py
  - wget https://data.broadinstitute.org/ccle/CCLE_DepMap_18q3_RNAseq_RPKM_20180718.gct
    -O source/ccle/CCLE_DepMap_18q3_RNAseq_RPKM_20180718.gct
  - curl -L -o source/ccle/CCLE_depMap_19Q1_TPM.csv https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fdepmap-rnaseq-expression-data-ccd0.12%2FCCLE_depMap_19Q1_TPM.csv
  - wget https://depmap.org/portal/download/api/download/external?file_name=processed_portal_downloads%2Fdepmap-public-cell-line-metadata-183e.4%2FDepMap-2019q1-celllines_v2.csv
    -O source/ccle/DepMap-2019q1-celllines.csv_v2.csv
  - curl -L -o source/ccle/sample_info.csv https://ndownloader.figshare.com/files/22629137
  - wget https://data.broadinstitute.org/ccle/CCLE_DepMap_18q3_maf_20180718.txt -O
    source/ccle/CCLE_DepMap_18q3_maf_20180718.txt
  outputs:
  - source/ccle/CCLE_depMap_19Q1_TPM_transcripts.csv
  - source/ccle/cellline_id_lookup.tsv
  - source/ccle/cellline_phenotype_lookup.tsv
  - source/ccle/cellline_properties_lookup.tsv
  - source/ccle/CCLE_DepMap_18q3_RNAseq_RPKM_20180718.gct
  - source/ccle/CCLE_depMap_19Q1_TPM.csv
  - source/ccle/DepMap-2019q1-celllines.csv_v2.csv
  - source/ccle/sample_info.csv
  - source/ccle/CCLE_DepMap_18q3_maf_20180718.txt

@bwalsh
Copy link
Contributor Author

bwalsh commented Feb 21, 2025

dataset updates now associated w/ last task in multi task dag

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant