-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed call due to high coverage #121
Comments
Also, it looks like we've also missed a clear ETH call Full read set "embB_M306V-ATG4247429GTG": {
"variant": "ref-M306V?var_name=ATG4247429GTG&num_alts=1&ref=NC_000962.3&enum=0&gene=embB&mut=M306V",
"genotype": [
1,
1
],
"genotype_likelihoods": [
-14407.132009802384,
-426.5460799265036
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 4,
"min_non_zero_depth": 3,
"kmer_count": 82,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 105,
"min_non_zero_depth": 82,
"kmer_count": 2203,
"klen": 21
}
},
"expected_depths": [
160
],
"contamination_depths": [],
"filter": [
"LOW_GT_CONF"
],
"conf": 13981
},
"_cls": "Call.VariantCall"
}, Filtered read set "embB_M306V-ATG4247429GTG": {
"variant": "ref-M306V?var_name=ATG4247429GTG&num_alts=1&ref=NC_000962.3&enum=0&gene=embB&mut=M306V",
"genotype": [
1,
1
],
"genotype_likelihoods": [
-5175.289309876459,
-89.3627577744604
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 0.0,
"median_depth": 0,
"min_non_zero_depth": 0,
"kmer_count": 0,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 25,
"min_non_zero_depth": 19,
"kmer_count": 515,
"klen": 21
}
},
"expected_depths": [
41.0
],
"contamination_depths": [],
"filter": [
"LOW_GT_CONF"
],
"conf": 5086
},
"_cls": "Call.VariantCall"
}, Maybe we need to have - in addition to the GT percentile stuff - some kind of overriding "hard-coded" rules that say something like if Ref depth is less than 2 and total depth is > 15 then ignore filters? (numbers are for illustration here) |
There seem to be two issues here.
1 I can believe and might be fixable by having a GCP threshold which drops when coverage is high. There is another issue related to that, can't find on my phone right now. |
I have an example where we miss a very clear resistant call due to LOW_GT but the depth on alt vs ref is very obvious. I appreciate it is hard when we are drawing lines based on percentiles. Anyway, I thought this might be a nice example to do some testing/debugging with.
This is a sample from Sylvianne in Madagascar. It was one of 5 samples in a multiplexed nanopore run. It has a HUGE amount of reads (fastq file is 3.3Gb - compressed!) They are testing out MGIT in the lab and were using mykrobe to compare results. The MGIT and phenotyping for this sample show resistance to Isoniazid, Rifampicin, and Ethambutol.
When I run mykrobe ok the full set of reads, it returns resistant to INH only. When I randomly subsample the file to 150x depth mykrobe says resistant to INH and RIF. When I filter out anything (from the subsampled file) that doesn't map to H37Rv I also get resistant to RIF and INH.
The variant in question is
rpoB_S450L-TCG761154TTG
. The information for this variant from the full and filtered runs are:Full read set
Filtered read set
All of the data from this run is stored on noah. I'll send through the paths on Slack.
The text was updated successfully, but these errors were encountered: