-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sequencing pileup #98
base: main
Are you sure you want to change the base?
Conversation
Alrighty. So now I have a mutation caller than I think works. Now I just need to populate it with real data. |
Aight, so now I have a wtf moment. The pileup and sequence analysis from |
Now, here is the problem, and how I am thinking about it: Bcftools and samtools are implementing variant calling on a base level - using base quality scores, mapping quality, read depth, bias, etc. However, most of those don't matter to me:
So really, the difference is that we can do allele analysis to a greater depth with vcf and the like. But we're not looking for variants, we're looking for if something is sequenced correctly or not with nanopore sequencing. With the newer R10 cells, the errors are usually not random - effectively, I don't care about random errors. I do care about nanopore sequencing errors - then the question is, how much does the quality score of different non-random nanopore errors compare to the correct scores? Let's do an analysis of this line:
Now the problem is that they're the base-level quality scores aren't that different:
Furthermore:
Essentially, the basecaller has no fucking clue, it still assigns a 99.6% correctness score to DNA which is clearly not correct. The base quality information, in this case, is simply not useful information. Thus, we gain more information from the strand analysis. I believe plasmidcall is sufficient, but now looking at doing a little bit more robust testing. |
Ok, now I have it mostly working with pileup files, but there are some exception cases. But those don't matter as much. I think the current version is working well enough to deploy:
This PR just needs some final finishing details and should be ready. |
this is code for doing real pileup sequence mutation calling. Attached are over 100 test sequencing pileups from a real run of mine, so this is what ACTUAL data will look like, with manual annotation of what I think is going on.