-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
species and lineage predictions #181
Comments
Hi there, @martinghunt, who recently updated our lineage scheme , is off sick at the moment, apologies for the delay, but we will reply ASAP. |
Thank you for your prompt response. There is no need for any apologies. I genuinely appreciate your time and assistance thus far. Please take all the time you need, and I hope @martinghunt recover quickly. Thank you once again for your support. |
Thanks @alantsangmb. I'll have a look into it this week ... |
Thanks for reporting these bugs. I'm still working on it, but here's an update... Yes you're right, the africanum probes are lineage 6. It's turned out to be non-trivial to fix but will get there. We've got an automated method to build from GTDB genomes, but simply forcing a lineage 5 sample in there resulted in breaking a bunch of other species calls. It's going to need a fair bit more work and testing. Re the canetti run ERR266120: are you using the CSV file? I'd recommend using the JSON file instead because you get more details. This does indeed need fixing, but is clearer what is happening from the JSON. The SNP call that defines lineage4 is low quality and fails mykrobe's filters, but is incorrectly being used to call the lineage, which means it ends up in the CSV file. The JSON has this:
Similar comments for your question about "lineage_per_covg" and "lineage depth": I would use the JSON file instead because it has the proper details. We're considering removing those two columns from the CSV file. |
For the record (and future me reading this), attaching JSON files of mykrobe version 0.13.0 run using the at time of writing latest panel 20230928/202309. ERR751348.myk_0.13.0.json |
Thank you so much for the detailed explanation. I am currently trying to comprehend the contents of the JSON files. I was wondering if the percent_coverage for the reference and alternative under lineage call should always add up to 100%. It seems that is not the case for ERR1023217 and ERR266120. Regarding ERR266120, I noticed it has 40% for the alternative and 0% for the reference. Do you happen to know where the remaining 60% went? Moreover, I am interested in understanding the conditions that would trigger the filters for "LOW_PERCENT_COVERAGE" and "LOW_TOTAL_DEPTH." I am also curious about the reason behind the median depth being 0, while the minimum non-zero depth is 141 for the "alternate". Sorry for hitting you with so many questions at once. |
Quick comment here.
Definitely not. These numbers are telling what what % of the reference probe has been observed, and what % of the alt probe has been observed. The probe is a sequence, converted to a list of kmers, and we ask what % of the kmers in that sequence have been seen in the data.
It means probably that the alt is present, but there is a SNP nearby
If the probe is 60 kmers say, and 59 of them are never seen (so coverage on those kmers is zero) but the last kmer is seen 141 times, then the median depth is zero , but the minimum non-zero depth is 141 |
Hello, I have recently obtained data on several Mtb lineage 5 and M. canetti strains, and I have been using Mykrobe v0.13.0 to predict their species and lineage.
In the case of the Mtb lineage 5 samples (ERR702413, ERR751348 and ERR1023217), Mykrobe consistently predicted the species as Mycobacterium tuberculosis instead of Mycobacterium_tuberculosis_variant_africanum, with a lineage assignment of lineage 5. I noticed that the reported species_per_covg for these samples was relatively low, around 22 to 24. I am wondering if this discrepancy could be attributed to the targeted probe used for detecting M. africanum, which might have been designed based on lineage 6. Is there a parameter available that would allow me to set a cutoff for the species_per_covg?
Additionally, I examined the M. canetti data (ERR266120) using Mykrobe, and I found that it correctly identified it as Mycobacterium canettii. However, the predicted lineage was reported as lineage 4 instead of an "unknown" lineage, as I have seen with other M. canetti strains. I would greatly appreciate it if you could shed some light on the possible reasons for this discrepancy. I must admit that I am not very familiar with the lineage prediction algorithm employed by Mykrobe.
Furthermore, I observed that the "lineage_per_covg" and "lineage depth" were displayed as NA for all samples I have tested. I am interested in obtaining a value for this parameter. Could you kindly advise me on how I can acquire and interpret this information?
Thank you very much for your assistance, and I look forward to your response.
The text was updated successfully, but these errors were encountered: