Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recovering multiallelics with bcftools norm is changing reference alleles to missing #2339

Open
bwspitzer opened this issue Dec 16, 2024 · 2 comments

Comments

@bwspitzer
Copy link

I'm using bcftools 1.20 to split multiallelic variants in one .vcf, merge it with a second .vcf, and then recover the multiallelic variants. For some variants, reference alleles are being counted as missing in the final file. Here are the ACs for one position along the way:

file_A.vcf.gz
chr21:1000000:G:T,GT
allele: G GT T missing
AC: 20431 5617 2 0

file_B.vcf.gz
chr21:1000000:G:GT
allele: G GT missing
AC: 1775 387 0

bcftools norm -a --atom-overlaps . --check-ref s -f reference_fasta.fa -m -both --multi-overlaps 0 -o file_A2.vcf.gz -O z file_A.vcf.gz

file_A2.vcf.gz
chr21:1000000:G:GT
allele: G GT missing
AC: 20433 5617 0
chr21:1000000:G:T
allele: G T missing
AC: 26048 2 0

bcftools norm --check-ref s -f reference_fasta.fa -o file_B2.vcf.gz -O z file_B.vcf.gz

file_B2.vcf.gz
chr21:1000000:G:GT
allele: G GT missing
AC: 1775 387 0

(I create a file named 'file_list.txt' with the names of file_A2.vcf.gz and file_B2.vcf.gz)

bcftools merge -m none -O z -o file_C.vcf.gz -l file_list.txt

file_C.vcf.gz:
chr21:1000000:G:GT
allele: G T missing
AC: 26048 2 2162
chr21:1000000:G:T
allele: G GT missing
AC: 22208 6004 0

bcftools norm -m +any -o file_C2.vcf.gz -O z file_C.vcf.gz

file_C2.vcf.gz
chr21:1000000:G:T,GT
allele: G GT T missing
AC: 20431 2 6004 1775

I'm seeing this same pattern (the ref alleles from file_B appear as missing in file_C2) for a number of variants. Is there a way that I can get bcftools to keep them as actual ref alleles? It's likely that I just need to use the correct options, but I've tried many combinations without success.

@pd3
Copy link
Member

pd3 commented Jan 2, 2025

Can you provide a small test case in the form of VCF files, please?

@bwspitzer
Copy link
Author

Certainly. The attached .zip contains two files (file_A.vcf.gz and file_B.vcf.gz) with one variant, chr21:14483696, that displays the behavior described.

The .zip also contains the files that were created during norming and merging. The exact commands that I used are included in the attached .txt file, as well as the code I used to count alleles. The reference file is too big to attach (of course), but it can be found here.

Thank you very much for your assistance!

20250102_bcftools_issue.txt
bcftools_issue.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants