Skip to content

Commit

Permalink
Update ref transcript mismatch reporter to handle protein coding vari…
Browse files Browse the repository at this point in the history
…ants
  • Loading branch information
susannasiebert committed Feb 5, 2024
1 parent c594cf6 commit e61dac2
Show file tree
Hide file tree
Showing 4 changed files with 324 additions and 10 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=FAIL,Description="Fail the site if all alleles fail but for different reasons.">
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">
##FILTER=<ID=multiallelic,Description="Site filtered because too many alt alleles pass tumor LOD">
##FILTER=<ID=n_ratio,Description="Ratio of N to alt exceeds specified ratio">
##FILTER=<ID=normal_artifact,Description="artifact_in_normal">
##FILTER=<ID=orientation,Description="Orientation bias detected by the orientation bias mixture model">
##FILTER=<ID=panel_of_normals,Description="Blacklisted site in panel of normals">
##FILTER=<ID=position,Description="median distance of alt variants from end of reads">
##FILTER=<ID=slippage,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=strand_bias,Description="Evidence for alt allele comes from one read direction only">
##FILTER=<ID=strict_strand,Description="Evidence for alt allele is not represented in both directions">
##FILTER=<ID=weak_evidence,Description="Mutation does not meet likelihood threshold">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=AS_FilterStatus,Number=1,Type=String,Description="Filter status for each allele, as assessed by ApplyRecalibration. Note that the VCF filter field will reflect the most lenient/sensitive status across all alleles.">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=R,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##SentieonCommandLine.TNfilter=<ID=TNfilter,Version="sentieon-genomics-202112.05",Date="2024-01-31T08:17:32Z",CommandLine="/sga_dev/zb-liaowanjun/sentieon-genomics-202112.05/libexec/driver -r /data2/data_share/pzx/reference/hs37d5/hs37d5.fa --algo TNfilter --tumor_sample T-4032 --normal_sample PB-4032 -v /sga_dev/zb-liaowanjun/sample36_joint/T-4032_PB-4032/matched_tmp/T-4032_jointTMP.vcf /sga_dev/zb-liaowanjun/sample36_joint/T-4032_PB-4032/matched_tmp/T-4032_jointunfiltered.vcf">
##SentieonCommandLine.TNhaplotyper2=<ID=TNhaplotyper2,Version="sentieon-genomics-202112.05",Date="2024-01-31T06:48:28Z",CommandLine="/sga_dev/zb-liaowanjun/sentieon-genomics-202112.05/libexec/driver -t 15 -r /data2/data_share/pzx/reference/hs37d5/hs37d5.fa -i /data2/dev_projects/xmy/TNB/validation_data/test1/PRJNA298330/T-4032/realigned/T-4032_final.bam -i /data2/dev_projects/xmy/TNB/validation_data/test1/PRJNA298330/PB-4032/realigned/PB-4032_final.bam --interval /sga_dev/panel_validation/V710_panel/bed/sort_KST700_v3_pd100_merged.bed --algo TNhaplotyper2 --call_germline_sites --min_init_tumor_lod 0 --min_tumor_lod 0.5 --prune_factor -1 --min_normal_lod 0 --tumor_sample T-4032 --normal_sample PB-4032 /sga_dev/zb-liaowanjun/sample36_joint/T-4032_PB-4032/matched_tmp/T-4032_jointTMP.vcf">
##contig=<ID=chr1,length=249250621,assembly=b37>
##contig=<ID=chr2,length=243199373,assembly=b37>
##contig=<ID=chr3,length=198022430,assembly=b37>
##contig=<ID=chr4,length=191154276,assembly=b37>
##contig=<ID=chr5,length=180915260,assembly=b37>
##contig=<ID=chr6,length=171115067,assembly=b37>
##contig=<ID=chr7,length=159138663,assembly=b37>
##contig=<ID=chr8,length=146364022,assembly=b37>
##contig=<ID=chr9,length=141213431,assembly=b37>
##contig=<ID=chr10,length=135534747,assembly=b37>
##contig=<ID=chr11,length=135006516,assembly=b37>
##contig=<ID=chr12,length=133851895,assembly=b37>
##contig=<ID=chr13,length=115169878,assembly=b37>
##contig=<ID=chr14,length=107349540,assembly=b37>
##contig=<ID=chr15,length=102531392,assembly=b37>
##contig=<ID=chr16,length=90354753,assembly=b37>
##contig=<ID=chr17,length=81195210,assembly=b37>
##contig=<ID=chr18,length=78077248,assembly=b37>
##contig=<ID=chr19,length=59128983,assembly=b37>
##contig=<ID=chr20,length=63025520,assembly=b37>
##contig=<ID=chr21,length=48129895,assembly=b37>
##contig=<ID=chr22,length=51304566,assembly=b37>
##contig=<ID=chrX,length=155270560,assembly=b37>
##contig=<ID=chrY,length=59373566,assembly=b37>
##contig=<ID=chrM,length=16569,assembly=b37>
##contig=<ID=GL000207.1,length=4262,assembly=b37>
##contig=<ID=GL000226.1,length=15008,assembly=b37>
##contig=<ID=GL000229.1,length=19913,assembly=b37>
##contig=<ID=GL000231.1,length=27386,assembly=b37>
##contig=<ID=GL000210.1,length=27682,assembly=b37>
##contig=<ID=GL000239.1,length=33824,assembly=b37>
##contig=<ID=GL000235.1,length=34474,assembly=b37>
##contig=<ID=GL000201.1,length=36148,assembly=b37>
##contig=<ID=GL000247.1,length=36422,assembly=b37>
##contig=<ID=GL000245.1,length=36651,assembly=b37>
##contig=<ID=GL000197.1,length=37175,assembly=b37>
##contig=<ID=GL000203.1,length=37498,assembly=b37>
##contig=<ID=GL000246.1,length=38154,assembly=b37>
##contig=<ID=GL000249.1,length=38502,assembly=b37>
##contig=<ID=GL000196.1,length=38914,assembly=b37>
##contig=<ID=GL000248.1,length=39786,assembly=b37>
##contig=<ID=GL000244.1,length=39929,assembly=b37>
##contig=<ID=GL000238.1,length=39939,assembly=b37>
##contig=<ID=GL000202.1,length=40103,assembly=b37>
##contig=<ID=GL000234.1,length=40531,assembly=b37>
##contig=<ID=GL000232.1,length=40652,assembly=b37>
##contig=<ID=GL000206.1,length=41001,assembly=b37>
##contig=<ID=GL000240.1,length=41933,assembly=b37>
##contig=<ID=GL000236.1,length=41934,assembly=b37>
##contig=<ID=GL000241.1,length=42152,assembly=b37>
##contig=<ID=GL000243.1,length=43341,assembly=b37>
##contig=<ID=GL000242.1,length=43523,assembly=b37>
##contig=<ID=GL000230.1,length=43691,assembly=b37>
##contig=<ID=GL000237.1,length=45867,assembly=b37>
##contig=<ID=GL000233.1,length=45941,assembly=b37>
##contig=<ID=GL000204.1,length=81310,assembly=b37>
##contig=<ID=GL000198.1,length=90085,assembly=b37>
##contig=<ID=GL000208.1,length=92689,assembly=b37>
##contig=<ID=GL000191.1,length=106433,assembly=b37>
##contig=<ID=GL000227.1,length=128374,assembly=b37>
##contig=<ID=GL000228.1,length=129120,assembly=b37>
##contig=<ID=GL000214.1,length=137718,assembly=b37>
##contig=<ID=GL000221.1,length=155397,assembly=b37>
##contig=<ID=GL000209.1,length=159169,assembly=b37>
##contig=<ID=GL000218.1,length=161147,assembly=b37>
##contig=<ID=GL000220.1,length=161802,assembly=b37>
##contig=<ID=GL000213.1,length=164239,assembly=b37>
##contig=<ID=GL000211.1,length=166566,assembly=b37>
##contig=<ID=GL000199.1,length=169874,assembly=b37>
##contig=<ID=GL000217.1,length=172149,assembly=b37>
##contig=<ID=GL000216.1,length=172294,assembly=b37>
##contig=<ID=GL000215.1,length=172545,assembly=b37>
##contig=<ID=GL000205.1,length=174588,assembly=b37>
##contig=<ID=GL000219.1,length=179198,assembly=b37>
##contig=<ID=GL000224.1,length=179693,assembly=b37>
##contig=<ID=GL000223.1,length=180455,assembly=b37>
##contig=<ID=GL000195.1,length=182896,assembly=b37>
##contig=<ID=GL000212.1,length=186858,assembly=b37>
##contig=<ID=GL000222.1,length=186861,assembly=b37>
##contig=<ID=GL000200.1,length=187035,assembly=b37>
##contig=<ID=GL000193.1,length=189789,assembly=b37>
##contig=<ID=GL000194.1,length=191469,assembly=b37>
##contig=<ID=GL000225.1,length=211173,assembly=b37>
##contig=<ID=GL000192.1,length=547496,assembly=b37>
##contig=<ID=NC_007605,length=171823,assembly=b37>
##contig=<ID=hs37d5,length=35477943,assembly=b37>
##reference=/xx/hs37d5.fa
##tumor_sample=Tumor-666
##normal_sample=Normal-666
##bcftools_filterVersion=1.11+htslib-1.11
##VEP="v103" time="2024-02-01 13:52:04" cache=/xx/homo_sapiens_refseq/103_GRCh37" ensembl-variation=103.06320c4 ensembl=103.4c8d44a ensembl-io=103.353f93a ensembl-funcgen=103.b53bef4 1000genomes="phase3" COSMIC="90" ClinVar="201912" ESP="20141103" HGMD-PUBLIC="20194" assembly="GRCh37.p13" dbSNP="153" gencode="GENCODE 19" genebuild="2011-04" gnomAD="r2.1" polyphen="2.2.2" refseq="2019-10-24 23:10:14 - GCF_000001405.25_GRCh37.p13_genomic.gff" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|TSL|REFSEQ_MATCH|REFSEQ_OFFSET|GIVEN_REF|USED_REF|BAM_EDIT|HGVS_OFFSET|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|FrameshiftSequence|WildtypeProtein">
##FrameshiftSequence=Predicted sequence for frameshift mutations
##WildtypeProtein=The normal, non-mutated protein sequence
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Normal-666 Tumor-666
chr12 48238361 . G GCCTCAATGAGGAGCACTCCAAGCAGTACCGCTGCCTCTCCTTCCAGCC . clustered_events AS_FilterStatus=SITE;AS_SB_TABLE=101,4|1,6;DP=118;ECNT=5;GERMQ=93;MBQ=37,34;MFRL=52,194;MMQ=60,60;MPOS=49;NALOD=1.48;NLOD=8.75;POPAF=6;TLOD=19.3;CSQ=CCTCAATGAGGAGCACTCCAAGCAGTACCGCTGCCTCTCCTTCCAGCC|stop_gained&protein_altering_variant|HIGH|VDR|7421|Transcript|NM_001364085.1|protein_coding|10/10||NM_001364085.1:c.1451_1452insGGCTGGAAGGAGAGGCAGCGGTACTGCTTGGAGTGCTCCTCATTGAGG|NP_001351014.1:p.Asn484delinsLysAlaGlyArgArgGlySerGlyThrAlaTrpSerAlaProHisTer|1611-1612|1451-1452|484|N/KAGRRGSGTAWSAPH*G|aac/aaGGCTGGAAGGAGAGGCAGCGGTACTGCTTGGAGTGCTCCTCATTGAGGc|||-1||EntrezGene|||rseq_mrna_nonmatch&rseq_5p_mismatch||||OK|||||||||||||||MEAMAASTSLPDPGDFDRNVPRICGVCGDRATGFHFNAMTCEGCKGFFRRSMKRKALFTCPFNGDCRITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMILKRKEEEALKDSLRPKLSEEQQRIIAILLDAHHKTYDPTYSDFCQFRPPVRVNDGGGSHPSRPNSRHTPSFSGDSSSSCSDHCITSSDMMDSSSFSNLDLSEEDSDDPSVTLELSQLSMLPHLADLVSYSIQKVIGFAKMIPGFRDLTSEDQIVLLKSSAIEVIMLRSNESFTMDDMSWTCGNQDYKYRVSDVTKAGHSLELIEPLIKFQVGLKKLNLHEEEHVLLMAICIVSPDRPGVQDAALIEAIQDRLSNTLQTYIRCRHPPPGSHLLYAKMIQKLADLRSLNEEHSKQYRCLSFQPECSMKLTPLVLEVFGNEISLGQPVAVPGWGCSSRATCQARGWRLLSSPPHPVWGSAPPLPPPLSTQPILSPVQPNPFPAGFSPVP GT:AD:AF:DP:F1R2:F2R1:SB 0/0:29,0:0.0318:29:17,0:12,0:29,0,0,0 0/1:76,7:0.0936:83:57,1:19,0:72,4,1,6
Loading

0 comments on commit e61dac2

Please sign in to comment.