Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notice partial gene deletion that spans start codon #24

Closed
mbhall88 opened this issue Jan 6, 2023 · 8 comments
Closed

Notice partial gene deletion that spans start codon #24

mbhall88 opened this issue Jan 6, 2023 · 8 comments

Comments

@mbhall88
Copy link
Owner

mbhall88 commented Jan 6, 2023

    I've been through all of the drprg PZA FNs that are called by at least one other tool.

There are two overarching problems drprg has

  1. A lot of the missed calls are minor allele calls for variants not covered by anything in the PRG. So, because they're minor alleles, they don't get discovered as novel. THe pncA PRG is quite sparse so it might be worth us adding some more PZA-resistant isolates to the reference PRG to try and capture more of the popn. variation. And where the minor alleles are covered by the PRG they seem to fail the GAPS threshold of 0.3
  2. There are some big deletions that knock out the start codon, and some. We (surprisingly) discover the deletion, but get no coverage on it (or the ref)
pncA    1       .       GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT        G,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCGGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATATCTT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCCGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACCTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATATCTT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACGACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCAGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACTACTTCTCCGGCACACCGGACTATTCCT,GTCATGTTCGCGATCGTCGCGGCGTCATGGACCCTATATCTGTGGCTGCCGCGTCGGTAGGCAAACTGCCCGGGCAGTCGCCCGAACGTATGGTGGACGTATGCGGGCGTTGATCATCGTCGACGTGCCGAACGACTTCTGCGAGGGTGGCTCGCTGGCGGTAACCGGTGGCGCCGCGCTGGCCCGCGCCATCAGCGACTACCTGGCCGAAGCGGCGGACTACCATCACGTCGTGGCAACCAAGGACTTCCACATCGACCCGGGTGACCACTTCTCCGGCACACCGGACTATTCCT .       .       VC=INDEL;GRAPHTYPE=SIMPLE       GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:0,0,0,0,0,0,0,0,0,0:1,1,1,1,1,1,1,1,1,1:-488,-488,-488,-488,-488,-488,-488,-488,-488,-488:0

One way around this could be to notice when we have more than n consecutive VCF entries with a failed/null call and just call resistant? Or, to be more precise, notice when we have a failed position(s) that spans the start codon and then call resistant if it is one of the genes where gene deletion causes resistance.

_Originally posted by @mbhall88 in mbhall88/drprg-paper#2

@mbhall88
Copy link
Owner Author

mbhall88 commented Jan 6, 2023

In particular, this issue is concerned with solving problem 2 above.

We can detect whole gene deletions, but not partial deletions which knock out the start codon - which effectively amount to the same thing.

I've seen this issue in both pncA and katG.

@mbhall88
Copy link
Owner Author

mbhall88 commented Jan 6, 2023

The implementation of this feature will likely need to change when/if iqbal-lab-org/pandora#316 is closed.

mbhall88 added a commit that referenced this issue Jan 6, 2023
@iqbal-lab
Copy link
Collaborator

iqbal-lab commented Jan 6, 2023

Do you mean that there is a potential fix even without a fix for iqbal-lab-org/pandora#316 ?

@mbhall88
Copy link
Owner Author

mbhall88 commented Jan 8, 2023

Well noticing failed variants that span the start codon would be a kind of band-aid fix. The proper fix would be the resolution of that pandora issue

mbhall88 added a commit that referenced this issue Jan 9, 2023
mbhall88 added a commit that referenced this issue Jan 12, 2023
@mbhall88
Copy link
Owner Author

Should we also be detecting when the stop is lost? There are two INH FNs that we miss because we don't detect stop loss and tbprofiler calls stop loss. We have null genotypes spanning the stop codon in both of these samples.

@iqbal-lab
Copy link
Collaborator

My guess is yes we should

@mbhall88
Copy link
Owner Author

I made a change to the partial gene deletion code and also removed the GT CONF filter. The (Illumina) diff I get from these changes is

Tool Drug ΔFN ΔFP
drprg Amikacin 0 0
drprg Capreomycin 0 0
drprg Delamanid 0 0
drprg Ethambutol -1 1
drprg Ethionamide -13 1
drprg Isoniazid 0 0
drprg Kanamycin 0 0
drprg Levofloxacin -1 0
drprg Linezolid 0 0
drprg Moxifloxacin 0 0
drprg Ofloxacin 0 0
drprg Pyrazinamide -1 0
drprg Rifampicin -1 0
drprg Streptomycin 0 4

Most of these FPs are also FPs on tbprofiler also.


Regarding the stop lost stuff, Miranda made a good point, maybe we just flag it as an unknown mutation?

@iqbal-lab
Copy link
Collaborator

Good idea, flag as unknown seems safest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants