Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscalling of SNPs and GQ value of 0 #9086

Open
Sgornard opened this issue Feb 3, 2025 · 1 comment
Open

Miscalling of SNPs and GQ value of 0 #9086

Sgornard opened this issue Feb 3, 2025 · 1 comment

Comments

@Sgornard
Copy link

Sgornard commented Feb 3, 2025

Problem report

Affected tool(s) or class(es)

HaplotypeCaller

Affected version : 4.1

Description

We used gatk HaplotypeCaller to find SNPs in our samples, following this command line:
gatk HaplotypeCaller --java-options "-Xmx20g -XX:ParallelGCThreads=10" -R ref_genome.fa -I sample.bam -ERC GVCF -O output_sample.g.vcf.gz

Afterward, we also used GenomicsDBImport, GenotypeGVCFs and VariantFiltration.

However, we noticed quickly some genotypes had a GQ tag equal to 0 despite a very good DP. Looking at the PL, it seemed like gatk could not decide between a homozygous and a heterozygous site, but still called a homozygous site and displayed all reads mapping to the reference allele. Looking at the read mapping in IGV, we noticed that those sites were in fact heterozygous, and therefore miscalled.

Here is an example (we could provide more if needed) :
Site :

<style> </style>
                  ZCh209E ZCh210C ZCh212C ZChGA78441 ZChGA78478 ZChGA78479 ZChGA78481 ZChGA78483 ZChGA78529
Chr2 136076952 . C G 29209.6 PASS AC=13;AF=0.952;AN=18;BaseQRankSum=0;DP=881;ExcessHet=0.2205;FS=0;InbreedingCoeff=0.4627;MLEAC=167;MLEAF=1;MQ=59.06;MQRankSum=0;QD=30.27;ReadPosRankSum=1.91;SOR=0.714 GT:AD:DP:GQ:PGT:PID:PL:PS 1/1:6,30:36:46:.:.:1036,46,0:. 0/0:35,0:35:0:.:.:0,0,209:. 0/1:12,15:27:99:.:.:545,0,408:. 0/1:17,11:28:99:.:.:360,0,438:. 1/1:3,23:26:53:.:.:846,53,0:. 1/1:0,32:32:96:.:.:1325,96,0:. 1/1:5,21:26:14:.:.:710,14,0:. 0/1:9,16:25:99:.:.:593,0,301:. 1/1:0,23:23:69:.:.:949,69,0:.

Here, genotype of the individual ZCh210C is said to be homozygous (GT 0/0) and have all reads mapping the reference allele (AD 35:0). The GQ is 0, and PL is 0 for both homozygous and heterozygous genotypes.
The reads in IGV for this individual map as follows:

Image
Which clearly shows reads mapping on the alternative allele.

Why is gatk having this behavior?

Thank you for your anwsers,
Best,

Samuel

@gokalpcelik
Copy link
Contributor

Have you tried with a recent version such as 4.6.1.0? 4.1 is quite old and it has been a long time since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants