-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the order to use the tools #4
Comments
It depends a lot on your input VCF file. Sort VCF file in the same order as the BAM file is not always needed, but only if The filter VCF file steps are only needed if you want to create your own specific VCF file (e.g. in case you can't use the 1000 genome VCF file as you have a different species or you have the mutations for all the cell lines you are interested in). |
Hi, Thank you for your very clear answer. One quick question regarding sort vcf. Do you have any idea how long it usually takes? I have one 1000g vcf that is roughly 16G that has been running for 48h (I only get around 4Mb output till now). I can start a new job with a longer time before HPC abort this job. Do you have idea/suggestion to run it a little bit faster? Cheers, |
Hi @XiaofeiSunUCSF , maybe you're using the wrong 1000 Genomes VCF? You can use the VCF with the full set of sites, but without genotypes (assuming this is for freemuxlet). For GRCh38 I used:
and it's only ~850 MB. Filtering to remove rare variants reduced it to <100 MB. For me this ran quite fast. |
@XiaofeiSunUCSF Sorting the VCF file shouldn't take that long even for such a big file. What is the exact command you run? What is the output of |
Hi cflerin @cflerin , Yes! My ultimate target is to use freemuxlet to get the genotyping of human samples. Thank you for this good information. I will try your VCF. Could you help me check if my procedure is correct
Thank you in advance! |
Hi ghuls @ghuls
I checked the VCF file
and the BAM file
|
@ghuls Can I just use |
It looks like your BAM file and VCF file have the same chromosome names convention (like "chr" in the beginning). Which tool generated your BAM file? The SAM header looks a bit weird. |
I added a check that will print an error when '@sq' header lines are not found in the BAM header. According to the SAM specification, lowercase header tags are reserved for local use: I think your BAM file can not be indexed or sorted by samtools as it does not have those '@sq' header lines. |
@XiaofeiSunUCSF, for freemuxlet you don't need to merge and process the full genotypes files. But I would still filter the sites VCF to remove rare variants. This is what I did:
I didn't need to use the sort VCF script since my bam was already in the proper sort order. |
Hi,
Thank you for the wonderful tools.
Do you have any recommended order to use those three tools (filter.bam, filter.vcf, sort.vcf)?
Thanks
Xiaofei
The text was updated successfully, but these errors were encountered: