Skip to content

Commit

Permalink
Local alleles merging, produce LAA and LPL when requested.
Browse files Browse the repository at this point in the history
This is a draft implementation of samtools/hts-specs#434
and haploid Number=G tags are not handled yet.
  • Loading branch information
pd3 committed Aug 14, 2020
1 parent 0a26dc9 commit e645749
Show file tree
Hide file tree
Showing 14 changed files with 669 additions and 2 deletions.
11 changes: 11 additions & 0 deletions doc/bcftools.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1618,6 +1618,17 @@ For "vertical" merge take a look at *<<concat,bcftools concat>>* or *<<norm,bcft
*-l, --file-list* 'FILE'::
Read file names from 'FILE', one file name per line.

*-L, --local-alleles* 'INT'::
Sites with many alternate alleles can require extremely large storage space which
can exceed the 2GB size limit representable by BCF. This is caused
by Number=G tags (such as FORMAT/PL) which store a value for each combination of reference
and alternate alleles. The *-L, --local-alleles* option allows to replace such tags
with a localized tag (FORMAT/LPL) which only includes a subset of alternate alleles relevant
for that sample. A new FORMAT/LAA tag is added which lists 1-based indices of the
alternate alleles relevant (local) for the current sample. The number 'INT' gives the
maximum number of alternate alleles that can be included in the PL tag. The default value
is 0 which disables the feature and outputs values for all alternate alleles.

*-m, --merge* 'snps'|'indels'|'both'|'all'|'none'|'id'::
The option controls what types of multiallelic records can be created:
----
Expand Down
17 changes: 17 additions & 0 deletions test/merge.LPL.1.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A 2:A 3:A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA,CC,CG . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10,.,.,.,.,.,.,.:10,0,10,.,.,.,.,.,.,.:1,2,.,.:0.1,0.2,.,.:1,.,. 0/2:10,.,.,0,.,10,.,.,.,.:10,.,.,0,.,10,.,.,.,.:1,.,2,.:0.1,.,0.2,.:.,1,. 0/3:10,.,.,.,.,.,0,.,.,10:10,.,.,.,.,.,0,.,.,10:1,.,.,2:0.1,.,.,0.2:.,.,1
1 30002 . C CA,CAA,CC,CCC,CG,CGG . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.:20,20,20,10,0,10,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.:0,1,2,.,.,.,.:0,0.1,0.2,.,.,.,.:1,2,.,.,.,. 3/4:20,.,.,.,.,.,20,.,.,20,10,.,.,0,10,.,.,.,.,.,.,.,.,.,.,.,.,.:20,.,.,.,.,.,20,.,.,20,10,.,.,0,10,.,.,.,.,.,.,.,.,.,.,.,.,.:0,.,.,1,2,.,.:0,.,.,0.1,0.2,.,.:.,.,1,2,.,. 5/6:20,.,.,.,.,.,.,.,.,.,.,.,.,.,.,20,.,.,.,.,20,10,.,.,.,.,0,10:20,.,.,.,.,.,.,.,.,.,.,.,.,.,.,20,.,.,.,.,20,10,.,.,.,.,0,10:0,.,.,.,.,1,2:0,.,.,.,.,0.1,0.2:.,.,.,.,1,2
1 30003 . C CA,CAA,CAAA . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:. ./.:.:.:.:.:.
1 30004 . C CC,CCC,CCCC . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:.
1 30005 . C CG,CGG,CGGG . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
8 changes: 8 additions & 0 deletions test/merge.LPL.2.a.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihood">
##contig=<ID=1,length=248956422>
##reference=file:///home/dnanexus/genome.fa
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT s1
1 10000001 . A AC . . . GT:PL 0/1:10,0,10
1 10000002 . A AC,ACC . . . GT:PL 1/2:20,20,20,10,0,10
23 changes: 23 additions & 0 deletions test/merge.LPL.2.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
##FORMAT=<ID=LAA,Number=.,Type=Integer,Description="Localized alleles: subset of alternate alleles relevant for each sample">
##FORMAT=<ID=LPL,Number=.,Type=Integer,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LGL,Number=.,Type=Float,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LAD,Number=.,Type=Integer,Description="Localized field: Allelic Depths">
##FORMAT=<ID=LDF,Number=.,Type=Float,Description="Localized field: Dummy">
##FORMAT=<ID=LDD,Number=.,Type=Integer,Description="Localized field: Dummy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A 2:A 3:A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA,CC,CG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 0/1:10,0,10:10,0,10:1,2:0.1,0.2:0:1 0/2:10,0,10:10,0,10:1,2:0.1,0.2:0:2 0/3:10,0,10:10,0,10:1,2:0.1,0.2:0:3
1 30002 . C CA,CAA,CC,CCC,CG,CGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,10,10:20,10,10:0,2:0,0.2:0:2 3/4:20,10,10:20,10,10:0,2:0,0.2:0:4 5/6:20,10,10:20,10,10:0,2:0,0.2:0:6
1 30003 . C CA,CAA,CAAA . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,10,10:20,10,10:0,2:0,0.2:3:2 ./.:.:.:.:.:.:. ./.:.:.:.:.:.:.
1 30004 . C CC,CCC,CCCC . . . GT:LPL:LGL:LAD:LDF:LDD:LAA ./.:.:.:.:.:.:. 1/2:20,10,10:20,10,10:0,2:0,0.2:3:2 ./.:.:.:.:.:.:.
1 30005 . C CG,CGG,CGGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA ./.:.:.:.:.:.:. ./.:.:.:.:.:.:. 1/2:20,10,10:20,10,10:0,2:0,0.2:3:2
23 changes: 23 additions & 0 deletions test/merge.LPL.3.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
##FORMAT=<ID=LAA,Number=.,Type=Integer,Description="Localized alleles: subset of alternate alleles relevant for each sample">
##FORMAT=<ID=LPL,Number=.,Type=Integer,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LGL,Number=.,Type=Float,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LAD,Number=.,Type=Integer,Description="Localized field: Allelic Depths">
##FORMAT=<ID=LDF,Number=.,Type=Float,Description="Localized field: Dummy">
##FORMAT=<ID=LDD,Number=.,Type=Integer,Description="Localized field: Dummy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A 2:A 3:A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA,CC,CG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 0/1:10,0,10:10,0,10:1,2:0.1,0.2:0:1 0/2:10,0,10:10,0,10:1,2:0.1,0.2:0:2 0/3:10,0,10:10,0,10:1,2:0.1,0.2:0:3
1 30002 . C CA,CAA,CC,CCC,CG,CGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:1,2 3/4:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:3,4 5/6:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:5,6
1 30003 . C CA,CAA,CAAA . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,3:1,2 ./.:.:.:.:.:.:.,. ./.:.:.:.:.:.:.
1 30004 . C CC,CCC,CCCC . . . GT:LPL:LGL:LAD:LDF:LDD:LAA ./.:.:.:.:.:.:. 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,3:1,2 ./.:.:.:.:.:.:.,2
1 30005 . C CG,CGG,CGGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA ./.:.:.:.:.:.:. ./.:.:.:.:.:.:.,. 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,3:1,2
23 changes: 23 additions & 0 deletions test/merge.LPL.4.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
##FORMAT=<ID=LAA,Number=.,Type=Integer,Description="Localized alleles: subset of alternate alleles relevant for each sample">
##FORMAT=<ID=LPL,Number=.,Type=Integer,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LGL,Number=.,Type=Float,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LAD,Number=.,Type=Integer,Description="Localized field: Allelic Depths">
##FORMAT=<ID=LDF,Number=.,Type=Float,Description="Localized field: Dummy">
##FORMAT=<ID=LDD,Number=.,Type=Integer,Description="Localized field: Dummy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A 2:A 3:A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA,CC,CG . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10,.,.,.,.,.,.,.:10,0,10,.,.,.,.,.,.,.:1,2,.,.:0.1,0.2,.,.:1,.,. 0/2:10,.,.,0,.,10,.,.,.,.:10,.,.,0,.,10,.,.,.,.:1,.,2,.:0.1,.,0.2,.:.,1,. 0/3:10,.,.,.,.,.,0,.,.,10:10,.,.,.,.,.,0,.,.,10:1,.,.,2:0.1,.,.,0.2:.,.,1
1 30002 . C CA,CAA,CC,CCC,CG,CGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:1,2 3/4:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:3,4 5/6:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:5,6
1 30003 . C CA,CAA,CAAA . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:. ./.:.:.:.:.:.
1 30004 . C CC,CCC,CCCC . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:.
1 30005 . C CG,CGG,CGGG . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
23 changes: 23 additions & 0 deletions test/merge.LPL.5.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
##FORMAT=<ID=LAA,Number=.,Type=Integer,Description="Localized alleles: subset of alternate alleles relevant for each sample">
##FORMAT=<ID=LPL,Number=.,Type=Integer,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LGL,Number=.,Type=Float,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LAD,Number=.,Type=Integer,Description="Localized field: Allelic Depths">
##FORMAT=<ID=LDF,Number=.,Type=Float,Description="Localized field: Dummy">
##FORMAT=<ID=LDD,Number=.,Type=Integer,Description="Localized field: Dummy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A 2:A 3:A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA,CC,CG . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10,.,.,.,.,.,.,.:10,0,10,.,.,.,.,.,.,.:1,2,.,.:0.1,0.2,.,.:1,.,. 0/2:10,.,.,0,.,10,.,.,.,.:10,.,.,0,.,10,.,.,.,.:1,.,2,.:0.1,.,0.2,.:.,1,. 0/3:10,.,.,.,.,.,0,.,.,10:10,.,.,.,.,.,0,.,.,10:1,.,.,2:0.1,.,.,0.2:.,.,1
1 30002 . C CA,CAA,CC,CCC,CG,CGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:1,2 3/4:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:3,4 5/6:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:5,6
1 30003 . C CA,CAA,CAAA . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:. ./.:.:.:.:.:.
1 30004 . C CC,CCC,CCCC . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:.
1 30005 . C CG,CGG,CGGG . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
23 changes: 23 additions & 0 deletions test/merge.LPL.6.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
##FORMAT=<ID=LAA,Number=.,Type=Integer,Description="Localized alleles: subset of alternate alleles relevant for each sample">
##FORMAT=<ID=LPL,Number=.,Type=Integer,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LGL,Number=.,Type=Float,Description="Localized field: Genotype Likelihoods">
##FORMAT=<ID=LAD,Number=.,Type=Integer,Description="Localized field: Allelic Depths">
##FORMAT=<ID=LDF,Number=.,Type=Float,Description="Localized field: Dummy">
##FORMAT=<ID=LDD,Number=.,Type=Integer,Description="Localized field: Dummy">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A 2:A 3:A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA,CC,CG . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10,.,.,.,.,.,.,.:10,0,10,.,.,.,.,.,.,.:1,2,.,.:0.1,0.2,.,.:1,.,. 0/2:10,.,.,0,.,10,.,.,.,.:10,.,.,0,.,10,.,.,.,.:1,.,2,.:0.1,.,0.2,.:.,1,. 0/3:10,.,.,.,.,.,0,.,.,10:10,.,.,.,.,.,0,.,.,10:1,.,.,2:0.1,.,.,0.2:.,.,1
1 30002 . C CA,CAA,CC,CCC,CG,CGG . . . GT:LPL:LGL:LAD:LDF:LDD:LAA 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:1,2 3/4:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:3,4 5/6:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:2,0:5,6
1 30003 . C CA,CAA,CAAA . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:. ./.:.:.:.:.:.
1 30004 . C CC,CCC,CCCC . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3 ./.:.:.:.:.:.
1 30005 . C CG,CGG,CGGG . . . GT:PL:GL:AD:DF:DD ./.:.:.:.:.:. ./.:.:.:.:.:. 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
14 changes: 14 additions & 0 deletions test/merge.LPL.a.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
##fileformat=VCFv4.3
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CA . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10:10,0,10:1,2:0.1,0.2:1
1 30002 . C CA,CAA . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:1,2
1 30003 . C CA,CAA,CAAA . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
14 changes: 14 additions & 0 deletions test/merge.LPL.b.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
##fileformat=VCFv4.3
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CC . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10:10,0,10:1,2:0.1,0.2:1
1 30002 . C CC,CCC . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:1,2
1 30004 . C CC,CCC,CCCC . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
14 changes: 14 additions & 0 deletions test/merge.LPL.c.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
##fileformat=VCFv4.3
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihoods">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic Depths">
##FORMAT=<ID=DF,Number=R,Type=Float,Description="Dummy">
##FORMAT=<ID=DD,Number=A,Type=Integer,Description="Dummy">
##contig=<ID=1,assembly=b37,length=249250621>
##reference=ref.fa
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
1 30000 . C T . . . GT:PL:GL:AD:DF:DD 0/1:20,0,20:20,0,20:1,2:0.1,0.2:1
1 30001 . C CG . . . GT:PL:GL:AD:DF:DD 0/1:10,0,10:10,0,10:1,2:0.1,0.2:1
1 30002 . C CG,CGG . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10:20,20,20,10,0,10:0,1,2:0,0.1,0.2:1,2
1 30005 . C CG,CGG,CGGG . . . GT:PL:GL:AD:DF:DD 1/2:20,20,20,10,0,10,20,20,10,20:20,20,20,10,0,10,20,20,10,20:0,1,2,1:0,0.1,0.2,0.1:1,2,3
8 changes: 8 additions & 0 deletions test/merge.lpl.b.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Genotype Likelihood">
##contig=<ID=1,length=248956422>
##reference=file:///home/dnanexus/genome.fa
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT s1
1 10000001 . A AT . . . GT:PL 0/1:10,0,10
1 10000002 . A AT,ATT . . . GT:PL 1/2:20,20,20,10,0,10
6 changes: 6 additions & 0 deletions test/test.pl
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,12 @@
test_vcf_isec($opts,in=>['isec-miss.1.1','isec-miss.1.2','isec-miss.1.3'],out=>'isec-miss.1.1.out',args=>'-R {PATH}/isec-miss.1.regs.txt -n +1');
test_vcf_isec($opts,in=>['isec-miss.2.1','isec-miss.2.2','isec-miss.2.3'],out=>'isec-miss.2.1.out',args=>'-n +1 -r 20:100,20:140,12:55,20:140,20:100');
test_vcf_isec($opts,in=>['isec-miss.2.1','isec-miss.2.2','isec-miss.2.3'],out=>'isec-miss.2.1.out',args=>'-R {PATH}/isec-miss.1.regs.txt -n +1');
test_vcf_merge($opts,in=>['merge.LPL.a','merge.LPL.b','merge.LPL.c'],out=>'merge.LPL.1.out',args=>'--force-samples');
test_vcf_merge($opts,in=>['merge.LPL.a','merge.LPL.b','merge.LPL.c'],out=>'merge.LPL.2.out',args=>'--force-samples -L 1');
test_vcf_merge($opts,in=>['merge.LPL.a','merge.LPL.b','merge.LPL.c'],out=>'merge.LPL.3.out',args=>'--force-samples -L 2');
test_vcf_merge($opts,in=>['merge.LPL.a','merge.LPL.b','merge.LPL.c'],out=>'merge.LPL.4.out',args=>'--force-samples -L 3');
test_vcf_merge($opts,in=>['merge.LPL.a','merge.LPL.b','merge.LPL.c'],out=>'merge.LPL.5.out',args=>'--force-samples -L 4');
test_vcf_merge($opts,in=>['merge.LPL.a','merge.LPL.b','merge.LPL.c'],out=>'merge.LPL.6.out',args=>'--force-samples -L 5');
test_vcf_merge($opts,in=>['merge.a','merge.b','merge.c'],out=>'merge.abc.out',args=>'--force-samples');
test_vcf_merge($opts,in=>['merge.a','merge.b','merge.c'],out=>'merge.abc.2.out',args=>'--force-samples -Fx');
test_vcf_merge($opts,in=>['merge.a','merge.b','merge.c'],out=>'merge.abc.3.out',args=>'--force-samples -0');
Expand Down
Loading

0 comments on commit e645749

Please sign in to comment.