diff --git a/README.md b/README.md index 3abfbf16..fc18164a 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package int #### STEP 0: Python -An installation of Python ([version 2.7.10 or higher](https://www.python.org/downloads/) is required to run PCGR. Check that Python is installed by typing `python --version` in a terminal window. +A local installation of Python (it has been tested with [version 2.7.13](https://www.python.org/downloads/)) is required to run PCGR. Check that Python is installed by typing `python --version` in a terminal window. #### STEP 1: Installation of Docker @@ -66,14 +66,14 @@ An installation of Python ([version 2.7.10 or higher](https://www.python.org/dow The PCGR workflow accepts two types of input files: - * An unannotated, single-sample VCF file with called somatic variants (SNVs/InDels) + * An unannotated, single-sample VCF file (>= v4.2) with called somatic variants (SNVs/InDels) * A copy number segment file PCGR can be run with either or both of the two input files present. The following requirements __MUST__ be met by the input VCF for PCGR to work properly: -1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. A description on how this can be done with the help of [vt](https://github.com/atks/vt) is described within the [documentation page for vcfanno](http://brentp.github.io/vcfanno/#preprocessing) +1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. This can be done with the help of either [vt decompose](http://genome.sph.umich.edu/wiki/Vt#Decompose) or [vcflib's vcfbreakmulti](https://github.com/vcflib/vcflib#vcflib). We will add integrated support for this in an upcoming release 2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by [vcftools](https://vcftools.github.io/perl_module.html#vcf-sort). * We strongly recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html) * 'chr' must be stripped from the chromosome names diff --git a/docs/_build/doctrees/environment.pickle b/docs/_build/doctrees/environment.pickle index eea4cab9..1db5cb6a 100644 Binary files a/docs/_build/doctrees/environment.pickle and b/docs/_build/doctrees/environment.pickle differ diff --git a/docs/_build/doctrees/getting_started.doctree b/docs/_build/doctrees/getting_started.doctree index 4f9b09b7..0d56aa1a 100644 Binary files a/docs/_build/doctrees/getting_started.doctree and b/docs/_build/doctrees/getting_started.doctree differ diff --git a/docs/_build/doctrees/output.doctree b/docs/_build/doctrees/output.doctree index c214e788..3d9a5384 100644 Binary files a/docs/_build/doctrees/output.doctree and b/docs/_build/doctrees/output.doctree differ diff --git a/docs/_build/html/_sources/getting_started.rst.txt b/docs/_build/html/_sources/getting_started.rst.txt index 28f39266..0d0ec297 100644 --- a/docs/_build/html/_sources/getting_started.rst.txt +++ b/docs/_build/html/_sources/getting_started.rst.txt @@ -35,9 +35,9 @@ Installation of Docker Python ^^^^^^ -An installation of Python (version 2.7.10 or higher) is required to run -PCGR. Check that Python is installed by typing ``python --version`` in -your terminal window. +An installation of Python (version 2.7.13) is required to run PCGR. +Check that Python is installed by typing ``python --version`` in your +terminal window. Download PCGR ^^^^^^^^^^^^^ @@ -60,7 +60,7 @@ Download PCGR have been produced - Pull the `PCGR Docker - image `__ (3.2Gb) from + image `__ (3.5Gb) from DockerHub): - ``docker pull sigven/pcgr`` (PCGR annotation engine) @@ -115,8 +115,9 @@ A tumor sample report is generated by calling the Python script overwrite of existing result files by using this flag (default: False) -The *examples* folder contain sample files from TCGA. A report for a -colorectal tumor case can be generated through the following command: +The *examples* folder contain input files from two tumor samples +sequenced within TCGA. A report for a colorectal tumor case can be +generated by running the following command in your terminal window: ``python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments`` ``tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD`` diff --git a/docs/_build/html/_sources/output.rst.txt b/docs/_build/html/_sources/output.rst.txt index a83e3ae1..d52b9134 100644 --- a/docs/_build/html/_sources/output.rst.txt +++ b/docs/_build/html/_sources/output.rst.txt @@ -6,8 +6,8 @@ Input The PCGR workflow accepts two types of input files: -- An unannotated, single-sample VCF file with called somatic variants - (SNVs/InDels) +- An unannotated, single-sample VCF file (>= v4.2) with called somatic + variants (SNVs/InDels) - A copy number segment file PCGR can be run with either or both of the two input files present. @@ -23,10 +23,10 @@ work properly: 1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single - alternative allele. A description on how this can be done with the - help of `vt `__ is described within the - `documentation page for - vcfanno `__ + alternative allele. This can be done with the help of either `vt + decompose `__ or + `vcflib's vcfbreakmulti `__. + We will add integrated support for this in an upcoming release 2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by `vcftools `__. diff --git a/docs/_build/html/getting_started.html b/docs/_build/html/getting_started.html index 362cd109..0bc4d2ba 100644 --- a/docs/_build/html/getting_started.html +++ b/docs/_build/html/getting_started.html @@ -182,9 +182,9 @@

Installation of Docker

Python

-

An installation of Python (version 2.7.10 or higher) is required to run -PCGR. Check that Python is installed by typing python --version in -your terminal window.

+

An installation of Python (version 2.7.13) is required to run PCGR. +Check that Python is installed by typing python --version in your +terminal window.

Download PCGR

@@ -207,7 +207,7 @@

Download PCGRPull the PCGR Docker -image (3.2Gb) from +image (3.5Gb) from DockerHub):

  • docker pull sigven/pcgr (PCGR annotation engine)
  • @@ -263,8 +263,9 @@

    Run test - generation of clinical report for a cancer genome(default: False)

-

The examples folder contain sample files from TCGA. A report for a -colorectal tumor case can be generated through the following command:

+

The examples folder contain input files from two tumor samples +sequenced within TCGA. A report for a colorectal tumor case can be +generated by running the following command in your terminal window:

python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD

This command will run the Docker-based PCGR workflow and produce the diff --git a/docs/_build/html/output.html b/docs/_build/html/output.html index 5dc672dc..76702533 100644 --- a/docs/_build/html/output.html +++ b/docs/_build/html/output.html @@ -171,8 +171,8 @@

Input & output

The PCGR workflow accepts two types of input files:

    -
  • An unannotated, single-sample VCF file with called somatic variants -(SNVs/InDels)
  • +
  • An unannotated, single-sample VCF file (>= v4.2) with called somatic +variants (SNVs/InDels)
  • A copy number segment file

PCGR can be run with either or both of the two input files present.

@@ -185,10 +185,10 @@

VCF
  • Variants in the raw VCF that contain multiple alternative alleles (e.g. “multiple ALTs”) must be split into variants with a single -alternative allele. A description on how this can be done with the -help of vt is described within the -documentation page for -vcfanno
  • +alternative allele. This can be done with the help of either vt +decompose or +vcflib’s vcfbreakmulti. +We will add integrated support for this in an upcoming release
  • The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by vcftools.
      diff --git a/docs/_build/html/searchindex.js b/docs/_build/html/searchindex.js index 08114d90..c94f4f62 100644 --- a/docs/_build/html/searchindex.js +++ b/docs/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["about","annotation_resources","getting_started","index","output"],envversion:50,filenames:["about.rst","annotation_resources.rst","getting_started.rst","index.rst","output.rst"],objects:{},objnames:{},objtypes:{},terms:{"1000g":4,"1000genom":1,"12th":[],"16gb":[],"17gb":2,"17th":1,"1kg":[],"2016_03":1,"2016_09":1,"2020plu":2,"28th":[],"2gb":2,"5gb":2,"8th":1,"case":[2,4],"class":4,"default":2,"function":[2,3,4],"import":[0,2,4],"public":4,"short":[1,4],CDS:4,EAS:4,For:[2,4],IDs:4,POS:[],SAS:4,The:[0,2,4],There:0,These:[],_strong:[],abber:3,aberr:[0,2,4],about:3,abov:4,accept:4,acceptor:4,access:[0,4],accord:[2,4],acid:4,acquir:0,across:4,action:4,ada:[],adapt:4,adddit:4,addit:[0,4],adjust:4,advanc:2,af_norm:[],af_tumor:[],affect:4,affected_spl:[],affili:0,afr:4,afr_af_1kg:4,afr_af_exac:4,afr_af_gnomad:4,african:4,after:4,aggreg:4,aid:0,algorithm:4,align:4,all:[0,1],allel:4,allele_num:4,alon:0,alreadi:2,alt:4,alter:[2,4],altern:4,american:4,amino:4,amino_acid:4,among:[],amplif:2,amr:4,amr_af_1kg:4,amr_af_exac:4,amr_af_gnomad:4,analys:2,analysi:0,analyz:4,ani:[2,4],annot:[0,2,3],annotation_resourc:[],antineoplast:[1,4],antineoplastic_drug_interact:4,antineoplastic_drugs_dgidb:4,appli:4,applic:[0,4],appri:[1,4],approv:4,approx:2,april:1,argument:2,asian:4,assembl:4,assign:2,associ:2,attach:[0,4],aug:[],b147:1,base:[2,3,4],basic:[0,2,3,4],been:[0,2,4],below:4,benign:[],best:4,betweeen:1,bgzip:4,bind:4,biocomput:4,bioconductor:4,biologi:0,biomark:[0,1,2],biotyp:4,block:4,block_substitut:[],bm_citat:4,bm_clinical_signific:4,bm_disease_nam:4,bm_drug_nam:4,bm_evidence_direct:4,bm_evidence_level:4,bm_evidence_typ:4,bm_rate:4,bool:[],boost:4,both:[0,4],breast:4,browser:4,build:4,bundl:[0,2],cadd:4,call:2,call_confid:[],caller:4,can:[0,2,4],cancer:4,cancer_census_germlin:4,cancer_census_somat:4,cancer_mutation_hotspot:4,cancer_typ:4,cancerhotspot:4,candid:4,canon:4,cap:4,caption:[],care:0,carri:4,catalog:[1,4],catalogu:1,categori:4,caus:4,causal:4,cbmdb:1,cbmdb_id:4,ccd:4,cdna:4,cdna_posit:4,cds:[],cds_chang:4,cds_end_nf:4,cds_posit:4,cds_start_nf:4,cell:4,cell_typ:4,cellular:[],cencu:1,censu:4,challeng:0,chang:[2,4],check:2,chr1:4,chr:4,chrom:4,chrome:4,chromosom:4,citat:[],cite:4,civic:[1,4],civic_id:4,civic_id_2:4,classif:4,clin:[],clin_sig:4,clinic:[0,3],clinvar:[1,4],clinvar_msid:4,clinvar_pmid:4,clinvar_sig:4,clinvar_variant_origin:4,cluster:4,cna:2,cna_seg:[2,4],cnminor:[],cntotal:[],cnv:[],coad:2,code:3,codon:4,codon_numb:4,cohort:1,colorect:[2,4],column:4,com:[],command:2,common:4,complet:[2,4],complex:0,composit:4,comprehens:0,compress:4,comput:2,confid:[],consensu:4,consequ:3,consid:4,consortium:[0,1,4],constitut:4,contain:[0,2,4],content:[3,4],context:0,contribut:[2,4],convent:4,coordin:4,copi:[0,2,3],correctli:4,cosmic:[1,4],cosmic_cancer_type_al:4,cosmic_cancer_type_gw:4,cosmic_codon_count_gw:4,cosmic_codon_frac_gw:4,cosmic_consequ:4,cosmic_count_gw:4,cosmic_drug_resist:4,cosmic_fathmm_pr:4,cosmic_mutation_id:4,cosmic_sample_sourc:4,cosmic_site_histolog:4,cosmic_vartyp:4,count:4,cover:4,cpu:2,creat:[],csq:4,curat:[1,4],current:4,damag:[],data:[0,2,4],databas:3,databundl:2,dataset:[],date:0,dbnsfp:[1,4],dbnsfp_consensus_lr:4,dbnsfp_consensus_svm:4,dbsnp:[1,4],dbsnp_mappingstatu:4,dbsnp_submiss:4,dbsnp_valid:4,dbsnpbuildid:4,dbsnprsid:4,dec:1,decompress:2,deconstructsig:4,dedic:[],defin:4,delet:[2,4],delin:4,denot:4,depend:0,depth:4,deriv:4,describ:4,descript:4,determin:[],develop:[0,4],dgidb:1,diagnosi:2,diagnost:[0,4],differ:[2,4],direct:[],directori:2,discov:1,diseas:4,distanc:4,distribut:4,dna:4,doc:[],docker:3,dockerhub:2,docm:1,docm_diseas:4,docm_pmid:4,document:4,domain:[3,4],done:4,donor:4,download:[],downstream:2,dp_normal:[],dp_tumor:[],drive:2,driver:[1,4],drug:[1,2,4],dure:2,each:4,eas_af_1kg:4,eas_af_exac:4,eas_af_gnomad:4,east:4,effect:[0,3],effect_predict:4,either:4,emerg:4,end:4,engin:2,ensembl:[0,4],ensembl_gene_id:4,ensembl_transcript_id:4,ensp:4,entrez:4,entrez_id:4,error:2,estim:2,etc:4,etiolog:[2,4],eur:4,eur_af_1kg:4,european:4,event:4,evid:[2,4],exac:[1,4],exampl:[2,4],exist:[2,4],existing_vari:4,exit:2,exom:[1,4],exon:4,experi:4,experienc:4,experiment:4,expert:0,explor:4,extend:0,facet:[],factor:4,fail:2,fall:4,fals:2,famili:1,fathmm:4,fathmm_mkl:4,featur:[3,4],feature_typ:4,feb:[],februai:1,februari:1,figur:0,file:[2,4],fin:[],fin_af_exac:4,fin_af_gnomad:4,find:[0,4],finnish:4,firefox:4,first:4,flag:[2,4],flag_pick_allel:4,flank:4,flexibl:0,focu:[],folder:2,follow:[2,4],forc:2,force_overwrit:2,fork:2,form:0,format:0,found:[2,4],four:4,frac:[],fraction:[],fraction_mut:4,frameshift:4,frequenc:3,from:[0,2,4],gain:4,gencod:[1,4],gencode_tag:4,gencode_transcript_typ:4,gencode_v19:4,gene:[0,2,3],gene_biotyp:4,gene_nam:4,gene_pheno:4,gene_symbol:4,gener:[0,3,4],genet:1,genindex:[],genom:[1,4],genome_vers:4,genomic_chang:4,genotyp:4,germlin:1,gerp:4,get:3,getting_start:[],given:4,global:4,global_af_1kg:4,global_af_exac:4,global_af_gnomad:4,gnomad:1,googl:[2,4],grch37:[2,4],great:0,guidelin:4,gwa:4,gwas_catalog_pmid:4,gwas_catalog_trait_uri:4,gz_:[],gzip:2,handl:4,has:[0,2,4],have:[0,2,4],hdiv:4,help:[2,4],here:4,hgnc:[],hgnc_id:4,hgv:4,hgvs_offset:4,hgvsc:4,hgvsp:4,hgvsp_short:4,high:4,high_inf_po:4,higher:2,highlight:0,histolog:4,hit:4,homozyg:2,hospit:0,host:2,hotspot:[1,4],how:4,html:[0,2,3],http:4,human:[1,4],humdiv:[],hvar:4,icgc:[1,4],icgc_project:4,identifi:[2,4],iii:0,imag:[0,2],impact:4,implic:4,improv:4,includ:4,incomplet:4,indel:[0,2,3],index:4,indic:4,individu:[0,4],inf:[],infer:4,inferenti:4,info:4,inform:1,initi:4,input:[2,3],input_cna_seg:2,input_vcf:2,insert:4,insilico:3,instal:0,institut:0,instruct:2,integr:0,intend:0,interact:[1,2,3],intern:1,interpret:[0,1,2],interrog:0,intersect:4,intogen:[1,4],intogen_driv:4,intogen_driver_mut:4,intro:[],intron:4,isoform:[1,4],isol:0,item:[2,4],its:4,jan:4,june:[],knowledg:[0,3],knowledgebas:1,known:[2,4],lack:4,larg:[],latest:2,least:[],length:4,level:[0,4],librari:0,lies:4,like:[],line:[],link:4,linux:2,list:[],literatur:4,log:[2,4],logist:4,logr:4,logr_threshold_amplif:2,logr_threshold_homozygous_delet:2,lost:4,low:4,lrt:4,mac:2,machin:[2,4],maf:2,mai:[1,4],make:[],mani:4,mappabl:4,mappingstatu:[],march:1,marker:[],master:[],match:4,matter:4,maxdepth:[],measur:4,memori:2,messag:2,met:4,minim:2,minimum:[],minor:[],missens:4,mix:4,mkdir:[],mkl:[],modifi:4,modindex:[],modul:[],most:[0,4],motif:4,motif_nam:4,motif_po:4,motif_score_chang:4,motiffeatur:4,mozilla:4,mrna:4,msid:[],multipl:4,must:[2,4],mut:[],mutat:[0,1,2],mutational_signatur:[2,4],mutationassessor:4,mutationtast:4,mutect:[],mutpr:4,mutsigcv:2,name:4,navig:0,ncgc:[],need:0,nfe:[],nfe_af_exac:4,nfe_af_gnomad:4,non:[1,4],none:2,normal:4,norwegian:0,notat:4,note:4,nov:4,novemb:1,novo:4,now:2,nucleotid:[2,4],num:[],num_vcfanno_process:2,num_vep_fork:2,number:[0,2,3],numer:4,observ:4,obtain:4,oct:[],offset:[],oncogen:[1,4],oncolog:[0,2],oncologist:0,oncoscor:4,one:4,onli:[0,4],ontolog:4,option:[2,4],order:4,org:4,organ:[2,4],origin:4,oslo:0,osx:[],oth:[],oth_af_exac:4,oth_af_gnomad:4,other:2,out:4,output:[2,3],overlap:[2,4],overview:[],overwrit:2,packag:[0,4],page:4,pair:4,pars:4,part:4,particular:4,pass:4,pcgr:[3,4],pcgr_directori:2,pcgreport:[],percent:4,person:2,pfam:1,phase3:1,phase:4,pheno:4,phenotyp:4,phred:[],pick:4,pipelin:4,platform:2,pmid:4,point:4,polyphen2:4,portrai:4,pose:0,posit:[2,4],potenti:4,pre:4,precis:[0,2],pred:[],predict:[3,4],predictor:[0,1],predispos:4,predisposit:4,prefer:2,prefix:2,prerequisit:3,present:[0,4],primari:4,princip:4,prinicip:1,priorit:0,process:[2,4],produc:[0,2],product:4,profil:4,prognosi:2,prognost:[0,4],project:4,properli:4,propos:4,proposed_aetiolog:4,prot:4,protein:3,protein_chang:4,protein_domain:4,protein_posit:4,provean:4,provid:4,pubm:4,pull:2,python:[],qualiti:4,queri:[2,4],quickstart:[],ram:2,rang:4,rate:4,ratio:[2,4],raw:4,recommend:4,record:4,ref:[],refer:[1,4],reflect:4,refseq:4,refseq_match:4,refut:4,regress:4,regulatori:4,regulatoryfeatur:4,rel:4,relat:[0,1],releas:[1,2,4],relev:[0,2,4],replac:2,report:[],repres:2,represent:4,requir:[0,2,4],research:0,resist:[2,4],resourc:[0,2,3],respect:4,restart:2,result:[0,2,4],retriev:[0,4],revel:4,rich:2,robust:4,root:[],rsid:4,run:[3,4],run_pcgr:2,safari:4,sampl:[2,4],sample_id:[2,4],sample_pair_identifi:[],sampleid:4,sas_af_1kg:4,sas_af_exac:4,sas_af_gnomad:4,scale:4,scarciti:0,scientist:0,score:4,screen:4,script:2,search:[],segment:2,segment_end:4,segment_length:4,segment_mean:4,segment_start:4,sensit:[2,4],sep:[],separ:2,sequenc:[1,4],set:[0,2,4],sever:0,shift:4,shortest:4,should:2,show:[2,4],sift:4,sig:[],sigantur:[],signatur:2,signature_id:4,signific:[1,4],sigven:2,similarli:2,singl:4,site:4,snv:[0,2,3],snvs_indel:[2,4],softwar:[0,2],somat:[0,1,2,3],sort:4,sourc:4,south:4,specif:4,sphinx:[],splice:4,splice_site_effect_ada:4,splice_site_effect_bool:[],splice_site_effect_rf:4,split:4,stabl:4,stand:0,standard:4,star:4,start:[3,4],statement:4,statist:1,statu:4,step:[],stop:4,strand:4,strelka:[],strip:4,strive:4,strong:[],strongli:4,structur:4,studi:4,subject:4,submiss:4,substitut:[],subtyp:4,support:4,suppressor:[1,4],svm:[],swiss:4,swissprot:[1,4],symbol:4,symbol_sourc:4,synonym:[1,4],systemat:0,tab:2,tabix:4,tabl:3,tag:4,take:2,taken:0,tar:2,target:4,tcga:[2,4],technolog:3,termin:2,test:[3,4],test_sampl:[],tfbp:4,tgz:2,therapeut:[0,4],therapi:4,thi:[2,4],through:[0,2,4],throughput:[],thu:0,tier:[0,2,4],tier_descript:4,toctre:[],todo:[],tool:[0,2],toolbar:2,total:4,trait:4,transcript:[1,2,4],transcript_end:4,transcript_overlap_perc:4,transcript_start:4,treatment:4,trembl:4,treshold:2,trial:4,trust:4,tsgene:[1,4],tsgene_oncogen:4,tsl:4,tsv:2,tumor:[0,1,2,4],tumor_sampl:2,tumor_suppressor:4,tumor_typ:4,tumorigenesi:4,two:4,type:[2,4],unannot:4,underli:[2,4],uniform:4,uniparc:4,uniprot:[1,4],uniprot_featur:4,uniprot_id:4,uniprotkb:4,uniqu:4,univers:0,unix:2,unpack:2,untar:2,upon:[],upper:4,uri:4,usag:2,use:[2,3],used:2,user:4,using:[0,2,4],util:3,v15:1,v19:[1,4],v22:[],v23:1,v30:[],v31:1,v78:[],v80:1,v85:1,valid:4,valu:2,variabl:4,variant:[0,2,3],variant_class:4,variat:4,variou:4,vartyp:[],vcf:2,vcf_sample_id:4,vcfanno:[0,2,4],vcftool:4,vector:4,vep:[0,1,2],vep_all_consequ:4,veri:2,version:[2,4],view:4,virtual:2,weak:[],weak_mutect:[],weak_strelka:[],weight:4,what:3,where:4,whether:4,which:[0,2,4],why:3,wide:[1,4],window:2,within:[2,4],work:[2,4],workflow:[0,2,4],working_directori:2,wtsi:4,xvf:2,you:2,your:2,yyyymmdd:2},titles:["About","Annotation resources","Getting started","Welcome to Personal Cancer Genome Reporter’s documentation!","Input & output"],titleterms:{"function":1,abber:4,about:0,all:4,among:4,annot:[1,4],associ:4,base:[0,1],basic:1,biomark:4,both:[],call:4,cancer:[0,1,2,3],clinic:[1,2,4],code:[1,4],consequ:[1,4],copi:4,data:[],databas:[1,4],differ:[],dna:[],docker:[0,2],document:3,domain:1,download:2,drug:[],effect:[1,4],etc:[],exampl:[],featur:1,format:4,frequenc:[1,4],gene:[1,4],gener:2,genom:[0,2,3],germlin:4,get:2,hotspot:[],html:4,includ:[],indel:4,indic:[],inform:4,input:4,insilico:1,instal:2,interact:4,introduct:[],knowledg:1,list:4,marker:[],mutat:4,ncgc:[],number:4,oncovarexplor:[],other:4,output:4,packag:[],pcgr:[0,2],person:[0,3],predict:1,preprocess:[],prerequisit:2,protein:[1,4],python:2,report:[0,2,3,4],resourc:1,run:2,segment:4,sensit:[],separ:4,signatur:4,snv:4,somat:4,sourc:[],start:2,tab:4,tabl:[],technolog:0,test:2,tsv:4,tumor:[],type:[],use:0,util:1,valu:4,variant:[1,4],variat:[],vcf:4,vep:4,welcom:3,what:0,why:0}}) \ No newline at end of file +Search.setIndex({docnames:["about","annotation_resources","getting_started","index","output"],envversion:50,filenames:["about.rst","annotation_resources.rst","getting_started.rst","index.rst","output.rst"],objects:{},objnames:{},objtypes:{},terms:{"1000g":4,"1000genom":1,"12th":[],"16gb":[],"17gb":2,"17th":1,"1kg":[],"2016_03":1,"2016_09":1,"2020plu":2,"28th":[],"2gb":[],"5gb":2,"8th":1,"case":[2,4],"class":4,"default":2,"function":[2,3,4],"import":[0,2,4],"public":4,"short":[1,4],CDS:4,EAS:4,For:[2,4],IDs:4,POS:[],SAS:4,The:[0,2,4],There:0,These:[],_strong:[],abber:3,aberr:[0,2,4],about:3,abov:4,accept:4,acceptor:4,access:[0,4],accord:[2,4],acid:4,acquir:0,across:4,action:4,ada:[],adapt:4,add:4,adddit:4,addit:[0,4],adjust:4,advanc:2,af_norm:[],af_tumor:[],affect:4,affected_spl:[],affili:0,afr:4,afr_af_1kg:4,afr_af_exac:4,afr_af_gnomad:4,african:4,after:4,aggreg:4,aid:0,algorithm:4,align:4,all:[0,1],allel:4,allele_num:4,alon:0,alreadi:2,alt:4,alter:[2,4],altern:4,american:4,amino:4,amino_acid:4,among:[],amplif:2,amr:4,amr_af_1kg:4,amr_af_exac:4,amr_af_gnomad:4,analys:2,analysi:0,analyz:4,ani:[2,4],annot:[0,2,3],annotation_resourc:[],antineoplast:[1,4],antineoplastic_drug_interact:4,antineoplastic_drugs_dgidb:4,appli:4,applic:[0,4],appri:[1,4],approv:4,approx:2,april:1,argument:2,asian:4,assembl:4,assign:2,associ:2,attach:[0,4],aug:[],b147:1,base:[2,3,4],basic:[0,2,3,4],been:[0,2,4],below:4,benign:[],best:4,betweeen:1,bgzip:4,bind:4,biocomput:4,bioconductor:4,biologi:0,biomark:[0,1,2],biotyp:4,block:4,block_substitut:[],bm_citat:4,bm_clinical_signific:4,bm_disease_nam:4,bm_drug_nam:4,bm_evidence_direct:4,bm_evidence_level:4,bm_evidence_typ:4,bm_rate:4,bool:[],boost:4,both:[0,4],breast:4,browser:4,build:4,bundl:[0,2],cadd:4,call:2,call_confid:[],caller:4,can:[0,2,4],cancer:4,cancer_census_germlin:4,cancer_census_somat:4,cancer_mutation_hotspot:4,cancer_typ:4,cancerhotspot:4,candid:4,canon:4,cap:4,caption:[],care:0,carri:4,catalog:[1,4],catalogu:1,categori:4,caus:4,causal:4,cbmdb:1,cbmdb_id:4,ccd:4,cdna:4,cdna_posit:4,cds:[],cds_chang:4,cds_end_nf:4,cds_posit:4,cds_start_nf:4,cell:4,cell_typ:4,cellular:[],cencu:1,censu:4,challeng:0,chang:[2,4],check:2,chr1:4,chr:4,chrom:4,chrome:4,chromosom:4,citat:[],cite:4,civic:[1,4],civic_id:4,civic_id_2:4,classif:4,clin:[],clin_sig:4,clinic:[0,3],clinvar:[1,4],clinvar_msid:4,clinvar_pmid:4,clinvar_sig:4,clinvar_variant_origin:4,cluster:4,cna:2,cna_seg:[2,4],cnminor:[],cntotal:[],cnv:[],coad:2,code:3,codon:4,codon_numb:4,cohort:1,colorect:[2,4],column:4,com:[],command:2,common:4,complet:[2,4],complex:0,composit:4,comprehens:0,compress:4,comput:2,confid:[],consensu:4,consequ:3,consid:4,consortium:[0,1,4],constitut:4,contain:[0,2,4],content:[3,4],context:0,contribut:[2,4],convent:4,coordin:4,copi:[0,2,3],correctli:4,cosmic:[1,4],cosmic_cancer_type_al:4,cosmic_cancer_type_gw:4,cosmic_codon_count_gw:4,cosmic_codon_frac_gw:4,cosmic_consequ:4,cosmic_count_gw:4,cosmic_drug_resist:4,cosmic_fathmm_pr:4,cosmic_mutation_id:4,cosmic_sample_sourc:4,cosmic_site_histolog:4,cosmic_vartyp:4,count:4,cover:4,cpu:2,creat:[],csq:4,curat:[1,4],current:4,damag:[],data:[0,2,4],databas:3,databundl:2,dataset:[],date:0,dbnsfp:[1,4],dbnsfp_consensus_lr:4,dbnsfp_consensus_svm:4,dbsnp:[1,4],dbsnp_mappingstatu:4,dbsnp_submiss:4,dbsnp_valid:4,dbsnpbuildid:4,dbsnprsid:4,dec:1,decompos:4,decompress:2,deconstructsig:4,dedic:[],defin:4,delet:[2,4],delin:4,denot:4,depend:0,depth:4,deriv:4,describ:4,descript:4,determin:[],develop:[0,4],dgidb:1,diagnosi:2,diagnost:[0,4],differ:[2,4],direct:[],directori:2,discov:1,diseas:4,distanc:4,distribut:4,dna:4,doc:[],docker:3,dockerhub:2,docm:1,docm_diseas:4,docm_pmid:4,document:[],domain:[3,4],done:4,donor:4,download:[],downstream:2,dp_normal:[],dp_tumor:[],drive:2,driver:[1,4],drug:[1,2,4],dure:2,each:4,eas_af_1kg:4,eas_af_exac:4,eas_af_gnomad:4,east:4,effect:[0,3],effect_predict:4,either:4,emerg:4,end:4,engin:2,ensembl:[0,4],ensembl_gene_id:4,ensembl_transcript_id:4,ensp:4,entrez:4,entrez_id:4,error:2,estim:2,etc:4,etiolog:[2,4],eur:4,eur_af_1kg:4,european:4,event:4,evid:[2,4],exac:[1,4],exampl:[2,4],exist:[2,4],existing_vari:4,exit:2,exom:[1,4],exon:4,experi:4,experienc:4,experiment:4,expert:0,explor:4,extend:0,facet:[],factor:4,fail:2,fall:4,fals:2,famili:1,fathmm:4,fathmm_mkl:4,featur:[3,4],feature_typ:4,feb:[],februai:1,februari:1,figur:0,file:[2,4],fin:[],fin_af_exac:4,fin_af_gnomad:4,find:[0,4],finnish:4,firefox:4,first:4,flag:[2,4],flag_pick_allel:4,flank:4,flexibl:0,focu:[],folder:2,follow:[2,4],forc:2,force_overwrit:2,fork:2,form:0,format:0,found:[2,4],four:4,frac:[],fraction:[],fraction_mut:4,frameshift:4,frequenc:3,from:[0,2,4],gain:4,gencod:[1,4],gencode_tag:4,gencode_transcript_typ:4,gencode_v19:4,gene:[0,2,3],gene_biotyp:4,gene_nam:4,gene_pheno:4,gene_symbol:4,gener:[0,3,4],genet:1,genindex:[],genom:[1,4],genome_vers:4,genomic_chang:4,genotyp:4,germlin:1,gerp:4,get:3,getting_start:[],given:4,global:4,global_af_1kg:4,global_af_exac:4,global_af_gnomad:4,gnomad:1,googl:[2,4],grch37:[2,4],great:0,guidelin:4,gwa:4,gwas_catalog_pmid:4,gwas_catalog_trait_uri:4,gz_:[],gzip:2,handl:4,has:[0,2,4],have:[0,2,4],hdiv:4,help:[2,4],here:4,hgnc:[],hgnc_id:4,hgv:4,hgvs_offset:4,hgvsc:4,hgvsp:4,hgvsp_short:4,high:4,high_inf_po:4,higher:[],highlight:0,histolog:4,hit:4,homozyg:2,hospit:0,host:2,hotspot:[1,4],how:4,html:[0,2,3],http:4,human:[1,4],humdiv:[],hvar:4,icgc:[1,4],icgc_project:4,identifi:[2,4],iii:0,imag:[0,2],impact:4,implic:4,improv:4,includ:4,incomplet:4,indel:[0,2,3],index:4,indic:4,individu:[0,4],inf:[],infer:4,inferenti:4,info:4,inform:1,initi:4,input:[2,3],input_cna_seg:2,input_vcf:2,insert:4,insilico:3,instal:0,institut:0,instruct:2,integr:[0,4],intend:0,interact:[1,2,3],intern:1,interpret:[0,1,2],interrog:0,intersect:4,intogen:[1,4],intogen_driv:4,intogen_driver_mut:4,intro:[],intron:4,isoform:[1,4],isol:0,item:[2,4],its:4,jan:4,june:[],knowledg:[0,3],knowledgebas:1,known:[2,4],lack:4,larg:[],latest:2,least:[],length:4,level:[0,4],librari:0,lies:4,like:[],line:[],link:4,linux:2,list:[],literatur:4,log:[2,4],logist:4,logr:4,logr_threshold_amplif:2,logr_threshold_homozygous_delet:2,lost:4,low:4,lrt:4,mac:2,machin:[2,4],maf:2,mai:[1,4],make:[],mani:4,mappabl:4,mappingstatu:[],march:1,marker:[],master:[],match:4,matter:4,maxdepth:[],measur:4,memori:2,messag:2,met:4,minim:2,minimum:[],minor:[],missens:4,mix:4,mkdir:[],mkl:[],modifi:4,modindex:[],modul:[],most:[0,4],motif:4,motif_nam:4,motif_po:4,motif_score_chang:4,motiffeatur:4,mozilla:4,mrna:4,msid:[],multipl:4,must:[2,4],mut:[],mutat:[0,1,2],mutational_signatur:[2,4],mutationassessor:4,mutationtast:4,mutect:[],mutpr:4,mutsigcv:2,name:4,navig:0,ncgc:[],need:0,nfe:[],nfe_af_exac:4,nfe_af_gnomad:4,non:[1,4],none:2,normal:4,norwegian:0,notat:4,note:4,nov:4,novemb:1,novo:4,now:2,nucleotid:[2,4],num:[],num_vcfanno_process:2,num_vep_fork:2,number:[0,2,3],numer:4,observ:4,obtain:4,oct:[],offset:[],oncogen:[1,4],oncolog:[0,2],oncologist:0,oncoscor:4,one:4,onli:[0,4],ontolog:4,option:[2,4],order:4,org:4,organ:[2,4],origin:4,oslo:0,osx:[],oth:[],oth_af_exac:4,oth_af_gnomad:4,other:2,out:4,output:[2,3],overlap:[2,4],overview:[],overwrit:2,packag:[0,4],page:[],pair:4,pars:4,part:4,particular:4,pass:4,pcgr:[3,4],pcgr_directori:2,pcgreport:[],percent:4,person:2,pfam:1,phase3:1,phase:4,pheno:4,phenotyp:4,phred:[],pick:4,pipelin:4,platform:2,pmid:4,point:4,polyphen2:4,portrai:4,pose:0,posit:[2,4],potenti:4,pre:4,precis:[0,2],pred:[],predict:[3,4],predictor:[0,1],predispos:4,predisposit:4,prefer:2,prefix:2,prerequisit:3,present:[0,4],primari:4,princip:4,prinicip:1,priorit:0,process:[2,4],produc:[0,2],product:4,profil:4,prognosi:2,prognost:[0,4],project:4,properli:4,propos:4,proposed_aetiolog:4,prot:4,protein:3,protein_chang:4,protein_domain:4,protein_posit:4,provean:4,provid:4,pubm:4,pull:2,python:[],qualiti:4,queri:[2,4],quickstart:[],ram:2,rang:4,rate:4,ratio:[2,4],raw:4,recommend:4,record:4,ref:[],refer:[1,4],reflect:4,refseq:4,refseq_match:4,refut:4,regress:4,regulatori:4,regulatoryfeatur:4,rel:4,relat:[0,1],releas:[1,2,4],relev:[0,2,4],replac:2,report:[],repres:2,represent:4,requir:[0,2,4],research:0,resist:[2,4],resourc:[0,2,3],respect:4,restart:2,result:[0,2,4],retriev:[0,4],revel:4,rich:2,robust:4,root:[],rsid:4,run:[3,4],run_pcgr:2,safari:4,sampl:[2,4],sample_id:[2,4],sample_pair_identifi:[],sampleid:4,sas_af_1kg:4,sas_af_exac:4,sas_af_gnomad:4,scale:4,scarciti:0,scientist:0,score:4,screen:4,script:2,search:[],segment:2,segment_end:4,segment_length:4,segment_mean:4,segment_start:4,sensit:[2,4],sep:[],separ:2,sequenc:[1,2,4],set:[0,2,4],sever:0,shift:4,shortest:4,should:2,show:[2,4],sift:4,sig:[],sigantur:[],signatur:2,signature_id:4,signific:[1,4],sigven:2,similarli:2,singl:4,site:4,snv:[0,2,3],snvs_indel:[2,4],softwar:[0,2],somat:[0,1,2,3],sort:4,sourc:4,south:4,specif:4,sphinx:[],splice:4,splice_site_effect_ada:4,splice_site_effect_bool:[],splice_site_effect_rf:4,split:4,stabl:4,stand:0,standard:4,star:4,start:[3,4],statement:4,statist:1,statu:4,step:[],stop:4,strand:4,strelka:[],strip:4,strive:4,strong:[],strongli:4,structur:4,studi:4,subject:4,submiss:4,substitut:[],subtyp:4,support:4,suppressor:[1,4],svm:[],swiss:4,swissprot:[1,4],symbol:4,symbol_sourc:4,synonym:[1,4],systemat:0,tab:2,tabix:4,tabl:3,tag:4,take:2,taken:0,tar:2,target:4,tcga:[2,4],technolog:3,termin:2,test:[3,4],test_sampl:[],tfbp:4,tgz:2,therapeut:[0,4],therapi:4,thi:[2,4],through:[0,2,4],throughput:[],thu:0,tier:[0,2,4],tier_descript:4,toctre:[],todo:[],tool:[0,2],toolbar:2,total:4,trait:4,transcript:[1,2,4],transcript_end:4,transcript_overlap_perc:4,transcript_start:4,treatment:4,trembl:4,treshold:2,trial:4,trust:4,tsgene:[1,4],tsgene_oncogen:4,tsl:4,tsv:2,tumor:[0,1,2,4],tumor_sampl:2,tumor_suppressor:4,tumor_typ:4,tumorigenesi:4,two:[2,4],type:[2,4],unannot:4,underli:[2,4],uniform:4,uniparc:4,uniprot:[1,4],uniprot_featur:4,uniprot_id:4,uniprotkb:4,uniqu:4,univers:0,unix:2,unpack:2,untar:2,upcom:4,upon:[],upper:4,uri:4,usag:2,use:[2,3],used:2,user:4,using:[0,2,4],util:3,v15:1,v19:[1,4],v22:[],v23:1,v30:[],v31:1,v78:[],v80:1,v85:1,valid:4,valu:2,variabl:4,variant:[0,2,3],variant_class:4,variat:4,variou:4,vartyp:[],vcf:2,vcf_sample_id:4,vcfanno:[0,2],vcfbreakmulti:4,vcflib:4,vcftool:4,vector:4,vep:[0,1,2],vep_all_consequ:4,veri:2,version:[2,4],view:4,virtual:2,weak:[],weak_mutect:[],weak_strelka:[],weight:4,what:3,where:4,whether:4,which:[0,2,4],why:3,wide:[1,4],window:2,within:[2,4],work:[2,4],workflow:[0,2,4],working_directori:2,wtsi:4,xvf:2,you:2,your:2,yyyymmdd:2},titles:["About","Annotation resources","Getting started","Welcome to Personal Cancer Genome Reporter’s documentation!","Input & output"],titleterms:{"function":1,abber:4,about:0,all:4,among:4,annot:[1,4],associ:4,base:[0,1],basic:1,biomark:4,both:[],call:4,cancer:[0,1,2,3],clinic:[1,2,4],code:[1,4],consequ:[1,4],copi:4,data:[],databas:[1,4],differ:[],dna:[],docker:[0,2],document:3,domain:1,download:2,drug:[],effect:[1,4],etc:[],exampl:[],featur:1,format:4,frequenc:[1,4],gene:[1,4],gener:2,genom:[0,2,3],germlin:4,get:2,hotspot:[],html:4,includ:[],indel:4,indic:[],inform:4,input:4,insilico:1,instal:2,interact:4,introduct:[],knowledg:1,list:4,marker:[],mutat:4,ncgc:[],number:4,oncovarexplor:[],other:4,output:4,packag:[],pcgr:[0,2],person:[0,3],predict:1,preprocess:[],prerequisit:2,protein:[1,4],python:2,report:[0,2,3,4],resourc:1,run:2,segment:4,sensit:[],separ:4,signatur:4,snv:4,somat:4,sourc:[],start:2,tab:4,tabl:[],technolog:0,test:2,tsv:4,tumor:[],type:[],use:0,util:1,valu:4,variant:[1,4],variat:[],vcf:4,vep:4,welcom:3,what:0,why:0}}) \ No newline at end of file diff --git a/docs/getting_started.md b/docs/getting_started.md index 3a93cd12..29984604 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -19,7 +19,7 @@ #### Python -An installation of Python (version 2.7.10 or higher) is required to run PCGR. Check that Python is installed by typing `python --version` in your terminal window. +An installation of Python (version 2.7.13) is required to run PCGR. Check that Python is installed by typing `python --version` in your terminal window. #### Download PCGR @@ -31,7 +31,7 @@ An installation of Python (version 2.7.10 or higher) is required to run PCGR. Ch A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced -* Pull the [PCGR Docker image](https://hub.docker.com/r/sigven/pcgr/) (3.2Gb) from DockerHub): +* Pull the [PCGR Docker image](https://hub.docker.com/r/sigven/pcgr/) (3.5Gb) from DockerHub): * `docker pull sigven/pcgr` (PCGR annotation engine) @@ -83,7 +83,7 @@ A tumor sample report is generated by calling the Python script __run_pcgr.py__, -The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command: +The _examples_ folder contain input files from two tumor samples sequenced within TCGA. A report for a colorectal tumor case can be generated by running the following command in your terminal window: `python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments` `tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD` diff --git a/docs/getting_started.rst b/docs/getting_started.rst index 28f39266..0d0ec297 100644 --- a/docs/getting_started.rst +++ b/docs/getting_started.rst @@ -35,9 +35,9 @@ Installation of Docker Python ^^^^^^ -An installation of Python (version 2.7.10 or higher) is required to run -PCGR. Check that Python is installed by typing ``python --version`` in -your terminal window. +An installation of Python (version 2.7.13) is required to run PCGR. +Check that Python is installed by typing ``python --version`` in your +terminal window. Download PCGR ^^^^^^^^^^^^^ @@ -60,7 +60,7 @@ Download PCGR have been produced - Pull the `PCGR Docker - image `__ (3.2Gb) from + image `__ (3.5Gb) from DockerHub): - ``docker pull sigven/pcgr`` (PCGR annotation engine) @@ -115,8 +115,9 @@ A tumor sample report is generated by calling the Python script overwrite of existing result files by using this flag (default: False) -The *examples* folder contain sample files from TCGA. A report for a -colorectal tumor case can be generated through the following command: +The *examples* folder contain input files from two tumor samples +sequenced within TCGA. A report for a colorectal tumor case can be +generated by running the following command in your terminal window: ``python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments`` ``tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD`` diff --git a/docs/output.md b/docs/output.md index a99676a6..1bb64b22 100644 --- a/docs/output.md +++ b/docs/output.md @@ -4,7 +4,7 @@ The PCGR workflow accepts two types of input files: - * An unannotated, single-sample VCF file with called somatic variants (SNVs/InDels) + * An unannotated, single-sample VCF file (>= v4.2) with called somatic variants (SNVs/InDels) * A copy number segment file PCGR can be run with either or both of the two input files present. @@ -15,7 +15,7 @@ __IMPORTANT NOTE__: Only the GRCh37 version of the human genome is currently sup The following requirements __MUST__ be met by the input VCF for PCGR to work properly: -1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. A description on how this can be done with the help of [vt](https://github.com/atks/vt) is described within the [documentation page for vcfanno](http://brentp.github.io/vcfanno/#preprocessing) +1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. This can be done with the help of either [vt decompose](http://genome.sph.umich.edu/wiki/Vt#Decompose) or [vcflib's vcfbreakmulti](https://github.com/vcflib/vcflib#vcflib). We will add integrated support for this in an upcoming release 2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by [vcftools](https://vcftools.github.io/perl_module.html#vcf-sort). * We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html) * 'chr' must be stripped from the chromosome names diff --git a/docs/output.rst b/docs/output.rst index a83e3ae1..d52b9134 100644 --- a/docs/output.rst +++ b/docs/output.rst @@ -6,8 +6,8 @@ Input The PCGR workflow accepts two types of input files: -- An unannotated, single-sample VCF file with called somatic variants - (SNVs/InDels) +- An unannotated, single-sample VCF file (>= v4.2) with called somatic + variants (SNVs/InDels) - A copy number segment file PCGR can be run with either or both of the two input files present. @@ -23,10 +23,10 @@ work properly: 1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single - alternative allele. A description on how this can be done with the - help of `vt `__ is described within the - `documentation page for - vcfanno `__ + alternative allele. This can be done with the help of either `vt + decompose `__ or + `vcflib's vcfbreakmulti `__. + We will add integrated support for this in an upcoming release 2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by `vcftools `__. diff --git a/run_pcgr.py b/run_pcgr.py index b6f6ffe5..1032f8da 100755 --- a/run_pcgr.py +++ b/run_pcgr.py @@ -139,7 +139,8 @@ def run_pcgr(input_vcf, input_cna_segments, logR_threshold_amplification, logR_t vcf_validate_command = str(docker_command_run3) + "pcgr_check_input.py " + str(input_vcf) + " " + str(input_cna_segments) + "\"" check_subprocess(vcf_validate_command) logger.info('Finished') - + return + if not input_vcf is None: ## Run VEP + vcfanno + summarise diff --git a/src/pcgr.tgz b/src/pcgr.tgz index 2c19e7b3..b7902b1a 100644 Binary files a/src/pcgr.tgz and b/src/pcgr.tgz differ diff --git a/src/pcgr/pcgr_check_input.py b/src/pcgr/pcgr_check_input.py index f144f8da..fbd87358 100755 --- a/src/pcgr/pcgr_check_input.py +++ b/src/pcgr/pcgr_check_input.py @@ -136,7 +136,7 @@ def verify_input(input_vcf, input_cna_segments): for rec in vcf: chrom = rec.CHROM if chrom.startswith('chr'): - error_message_chrom = "'chr' must be stripped from chromosome names: " + str(rec.CHROM) + error_message_chrom = "'chr' must be stripped from chromosome names: " + str(rec.CHROM + ", see http://pcgr.readthedocs.io/en/latest/output.html#vcf-preprocessing") logger.error(error_message_chrom) return -1 POS = rec.start + 1 @@ -144,20 +144,23 @@ def verify_input(input_vcf, input_cna_segments): if len(rec.ALT) > 1: logger.error('') logger.error("Multiallelic site detected:" + str(rec.CHROM) + '\t' + str(POS) + '\t' + str(rec.REF) + '\t' + str(alt)) - logger.error('Alternative alleles must be decomposed and normalized, see http://pcgr.readthedocs.io/en/latest/output.html#vcf-preprocessing') + logger.error('Alternative alleles must be decomposed, see http://pcgr.readthedocs.io/en/latest/output.html#vcf-preprocessing') logger.error('') multiallelic_alt = 1 return -1 command_vcf_sample_free1 = 'egrep \'^##\' ' + str(input_vcf) + ' > ' + str(input_vcf_pcgr_ready) command_vcf_sample_free2 = 'egrep \'^#CHROM\' ' + str(input_vcf) + ' | cut -f1-8 >> ' + str(input_vcf_pcgr_ready) - command_vcf_sample_free3 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | cut -f1-8 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free3 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | cut -f1-8 | egrep -v \'^[XYM]\' | sort -k1,1n -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free4 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | cut -f1-8 | egrep \'^[XYM]\' | sort -k1,1 -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) if input_vcf.endswith('.gz'): command_vcf_sample_free1 = 'bgzip -dc ' + str(input_vcf) + ' | egrep \'^##\' > ' + str(input_vcf_pcgr_ready) command_vcf_sample_free2 = 'bgzip -dc ' + str(input_vcf) + ' | egrep \'^#CHROM\' | cut -f1-8 >> ' + str(input_vcf_pcgr_ready) - command_vcf_sample_free3 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | cut -f1-8 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free3 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | cut -f1-8 | egrep -v \'^[XYM]\' | sort -k1,1n -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free4 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | cut -f1-8 | egrep \'^[XYM]\' | sort -k1,1 -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) os.system(command_vcf_sample_free1) os.system(command_vcf_sample_free2) os.system(command_vcf_sample_free3) + os.system(command_vcf_sample_free4) os.system('bgzip -f ' + str(input_vcf_pcgr_ready)) os.system('tabix -p vcf ' + str(input_vcf_pcgr_ready) + '.gz')