-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DC-1238: Add dummy genomic data to test OMOP datasets #1824
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below:
@@ -0,0 +1,2 @@ | |||
{"collaborator_participant_id": "77", "collaborator_sample_id": "SM-77", "contamination_rate": 0.001, "exome_gvcf_index_path": "gs://fc-00000000-0000-0000-0000-000000000000/sample.wes.hard-filtered.gvcf.gz.tbi", "exome_gvcf_md5_path": "gs://fc-00000000-0000-0000-0000-000000000000/sample.wes.hard-filtered.gvcf.gz.md5", "exome_gvcf_path": "gs://fc-00000000-0000-0000-0000-000000000000/sample.wes.hard-filtered.gvcf.gz", "genome_crai_path": "gs://fc-00000000-0000-0000-0000-000000000000/sample.cram.crai", "genome_cram_md5_path": "gs://fc-00000000-0000-0000-0000-000000000000/sample.cram.md5", "genome_cram_path": "gs://fc-00000000-0000-0000-0000-000000000000/sample.cram", "mapped_percentage": 99.86, "mean_off_target_coverage": 3.46, "mean_target_coverage": 53.8, "percent_target_bases_at_10x": 98.36, "percent_wgs_bases_at_1x": 94.77, "reblocked_gvcf": "gs://fc-00000000-0000-0000-0000-000000000000/sample.rb.g.vcf.gz", "reblocked_gvcf_index": "gs://fc-00000000-0000-0000-0000-000000000000/sample.rb.g.vcf.gz.tbi", "total_bases": 23772304683} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a jsonl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all the data tables are jsonl. I found a number of changes like this that would be an improvement, but decided to limit the work done in this PR, I'll make another tech debt ticket to further clean up this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file extension is json
, should it be jsonl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. My nagging concern here would be the tight coupling between the test classes (integration, service, and DAO tests) to the single dataset file which is how these tests were originally written.
Is your concern about having multiple tests depend on the same files, or having tests depend on files outside the test itself, or something else? For tests that require a lot of assets to set up, I think it makes sense to use external data files like this, rather than embedding the json in the test code, or adding objects that define the models for the json data. But I agree that this separation means that you have to be extra careful when updating the data files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good! are the datasets in dev/prod going to be updated as a part of this work? I'm excited to see the new table in action
My concern is that each of the different test classes are now dependent on the same underlying data such that any change to the underlying data affects multiple tests in potentially different ways - essentially the last point that you make. |
Jira ticket: https://broadworkbench.atlassian.net/browse/DC-1238
Addresses
Update test OMOP data for integration / connected tests and for local dataset setup to include a
sample
table with dummy genomic data.Summary of changes
setup_tdr_resources.py
.jsonl
file to.json
Testing Strategy
setup_tdr_resources.py
locally