Skip to content

Commit

Permalink
v0.3
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed Apr 12, 2017
1 parent 2153f7c commit 8d70d8d
Show file tree
Hide file tree
Showing 27 changed files with 915 additions and 49,364 deletions.
103 changes: 53 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package int
[![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest)


### Annotation resources included in PCGR (v0.2)
### Annotation resources included in PCGR (v0.3)

* [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset)
* [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017)
Expand Down Expand Up @@ -53,14 +53,15 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/

#### STEP 2: Download PCGR

1. Download and unpack the [latest release](https://github.com/sigven/pcgr/releases/latest)
<font color="red"><b>April 12th 2017</b>: New release (v0.3)</font>
1. Download and unpack the [latest release (v0.3)](https://github.com/sigven/pcgr/releases/latest)
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number)
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3`)
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`

A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
3. Pull the [PCGR Docker image](https://hub.docker.com/r/sigven/pcgr/) from DockerHub:
* `docker pull sigven/pcgr` (PCGR annotation engine)
3. Pull the [PCGR Docker image (v0.3)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
* `docker pull sigven/pcgr:0.3` (PCGR annotation engine)

#### STEP 3: Input preprocessing

Expand Down Expand Up @@ -94,55 +95,57 @@ Here, _Chromosome_, _Start_, and _End_ denote the chromosomal segment (GRCh37),

#### STEP 4: Run example

A tumor sample report is generated by calling the Python script __run_pcgr.py__ in the PCGR software folder, which takes the following arguments and options:

usage: run_pcgr.py [-h] [--input_vcf INPUT_VCF]
[--input_cna_segments INPUT_CNA_SEGMENTS]
[--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION]
[--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION]
[--num_vcfanno_processes NUM_VCFANNO_PROCESSES]
[--num_vep_forks NUM_VEP_FORKS] [--force_overwrite]
pcgr_directory working_directory sample_id

Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
somatic nucleotide variants and copy number aberration segments

positional arguments:
pcgr_directory PCGR base directory
working_directory Working directory - directory with input/output files
sample_id Tumor sample/cancer genome identifier - prefix for
output files

optional arguments:
-h, --help show this help message and exit
--input_vcf INPUT_VCF
VCF input file with somatic query variants
(SNVs/InDels) (default: None)
--input_cna_segments INPUT_CNA_SEGMENTS
Somatic copy number alteration segments (tab-separated
values) (default: None)
--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION
Log(2) ratio treshold for calling copy number
amplifications in HTML report (default: 0.8)
--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION
Log(2) ratio treshold for calling homozygous deletions
in HTML report (default: -0.8)
--num_vcfanno_processes NUM_VCFANNO_PROCESSES
Number of processes used during vcfanno annotation
(default: 4)
--num_vep_forks NUM_VEP_FORKS
Number of forks (--forks option in VEP) used during
VEP annotation (default: 4)
--force_overwrite By default, the script will fail with an error if any
output file already exists. You can force the
overwrite of existing result files by using this flag
(default: False)

A tumor sample report is generated by calling the Python script __pcgr.py__ in the PCGR software folder, which takes the following arguments and options:

usage: pcgr.py [-h] [--input_vcf INPUT_VCF]
[--input_cna_segments INPUT_CNA_SEGMENTS]
[--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION]
[--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION]
[--num_vcfanno_processes NUM_VCFANNO_PROCESSES]
[--num_vep_forks NUM_VEP_FORKS] [--force_overwrite]
[--version]
pcgr_dir output_dir sample_id

Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
somatic nucleotide variants and copy number aberration segments

positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
e.g. ~/pcgr-0.3
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files

optional arguments:
-h, --help show this help message and exit
--input_vcf INPUT_VCF
VCF input file with somatic query variants
(SNVs/InDels) (default: None)
--input_cna_segments INPUT_CNA_SEGMENTS
Somatic copy number alteration segments (tab-separated
values) (default: None)
--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION
Log(2) ratio treshold for calling copy number
amplifications in HTML report (default: 0.8)
--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION
Log(2) ratio treshold for calling homozygous deletions
in HTML report (default: -0.8)
--num_vcfanno_processes NUM_VCFANNO_PROCESSES
Number of processes used during vcfanno annotation
(default: 4)
--num_vep_forks NUM_VEP_FORKS
Number of forks (--forks option in VEP) used during
VEP annotation (default: 4)
--force_overwrite By default, the script will fail with an error if any
output file already exists. You can force the
overwrite of existing result files by using this flag
(default: False)
--version show program's version number and exit


The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command:

`python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD`
`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3 ~/pcgr-0.3/examples tumor_sample.COAD`

This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:

Expand Down
Binary file modified docs/_build/doctrees/annotation_resources.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/getting_started.doctree
Binary file not shown.
37 changes: 21 additions & 16 deletions docs/_build/html/_sources/annotation_resources.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,24 @@ corresponds to variants reported with other HGVS nomenclature (e.g.
Other data quality concerns
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Clinical biomarkers** Clinical biomarkers included in PCGR is limited
to the following: \* Markers reported at the variant level (e.g. **BRAF
p.V600E**) \* Markers reported at the codon level (e.g. **KRAS p.G12**)
\* Markers reported at the exon level (e.g. **KIT exon 11 mutation**) \*
Within CBMDB, only markers collected from FDA/NCCN guidelines,
scientific literature and clinical trials are included (markers
collected from conference abstracts are not included)

**COSMIC variants** The COSMIC dataset that is part of the PCGR
annotation bundle is the subset of variants that satisfy the following
criteria: \* **Mutation somatic status** is either
'*confirmed\_somatic*' or
'*reported\_in\_another\_cancer\_sample\_as\_somatic*'. \*
**Site/histology** must be known and the sample must come from a
malignant tumor (i.e. not polyps/adenomas, which are also found in
COSMIC)
**Clinical biomarkers**

Clinical biomarkers included in PCGR is limited to the following:

- Markers reported at the variant level (e.g. **BRAF p.V600E**)
- Markers reported at the codon level (e.g. **KRAS p.G12**)
- Markers reported at the exon level (e.g. **KIT exon 11 mutation**)
- Within CBMDB, only markers collected from FDA/NCCN guidelines,
scientific literature and clinical trials are included (markers
collected from conference abstracts are not included)

**COSMIC variants**

The COSMIC dataset that is part of the PCGR annotation bundle is the
subset of variants that satisfy the following criteria:

- **Mutation somatic status** is either '*confirmed\_somatic*' or
'*reported\_in\_another\_cancer\_sample\_as\_somatic*'.
- **Site/histology** must be known and the sample must come from a
malignant tumor (i.e. not polyps/adenomas, which are also found in
COSMIC)
31 changes: 17 additions & 14 deletions docs/_build/html/_sources/getting_started.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,51 +42,53 @@ terminal window.
Download PCGR
^^^^^^^^^^^^^

- Download and unpack the `latest
release <https://github.com/sigven/pcgr/releases/latest>`__
April 12th 2017: New release (v0.3) \* Download and unpack the `latest
release (v0.3) <https://github.com/sigven/pcgr/releases/latest>`__

- Download and unpack the data bundle (approx. 17Gb) in the PCGR
directory

- Download `the latest data
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/>`__
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
version number)
version number, e.g. ``~/pcgr-0.3``)
- Decompress and untar the bundle, e.g. through the following Unix
command:
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``

A *data/* folder within the *pcgr-X.X* software folder should now
have been produced

- Pull the `PCGR Docker
image <https://hub.docker.com/r/sigven/pcgr/>`__ (3.5Gb) from
DockerHub):
- Pull the `PCGR Docker image -
v0.3 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
(3.1Gb) :

- ``docker pull sigven/pcgr`` (PCGR annotation engine)
- ``docker pull sigven/pcgr:0.3`` (PCGR annotation engine)

Run test - generation of clinical report for a cancer genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A tumor sample report is generated by calling the Python script
**run\_pcgr.py**, which takes the following arguments and options:
**pcgr.py**, which takes the following arguments and options:

::

usage: run_pcgr.py [-h] [--input_vcf INPUT_VCF]
usage: pcgr.py [-h] [--input_vcf INPUT_VCF]
[--input_cna_segments INPUT_CNA_SEGMENTS]
[--logR_threshold_amplification LOGR_THRESHOLD_AMPLIFICATION]
[--logR_threshold_homozygous_deletion LOGR_THRESHOLD_HOMOZYGOUS_DELETION]
[--num_vcfanno_processes NUM_VCFANNO_PROCESSES]
[--num_vep_forks NUM_VEP_FORKS] [--force_overwrite]
pcgr_directory working_directory sample_id
[--version]
pcgr_dir output_dir sample_id

Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
somatic nucleotide variants and copy number aberration segments

positional arguments:
pcgr_directory PCGR base directory
working_directory Working directory - directory with input/output files
pcgr_dir PCGR base directory with accompanying data directory,
e.g. ~/pcgr-0.3
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files

Expand Down Expand Up @@ -114,13 +116,14 @@ A tumor sample report is generated by calling the Python script
output file already exists. You can force the
overwrite of existing result files by using this flag
(default: False)
--version show program's version number and exit

The *examples* folder contain input files from two tumor samples
sequenced within TCGA. A report for a colorectal tumor case can be
generated by running the following command in your terminal window:

``python run_pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments``
``tumor_sample.COAD.cna.tsv ~/pcgr-X.X ~/pcgr-X.X/examples tumor_sample.COAD``
``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3 ~/pcgr-0.3/examples tumor_sample.COAD``

This command will run the Docker-based PCGR workflow and produce the
following output files in the *examples* folder:
Expand Down
31 changes: 18 additions & 13 deletions docs/_build/html/annotation_resources.html
Original file line number Diff line number Diff line change
Expand Up @@ -246,21 +246,26 @@ <h2>Genome mapping<a class="headerlink" href="#genome-mapping" title="Permalink
</div>
<div class="section" id="other-data-quality-concerns">
<h2>Other data quality concerns<a class="headerlink" href="#other-data-quality-concerns" title="Permalink to this headline"></a></h2>
<p><strong>Clinical biomarkers</strong> Clinical biomarkers included in PCGR is limited
to the following: * Markers reported at the variant level (e.g. <strong>BRAF
p.V600E</strong>) * Markers reported at the codon level (e.g. <strong>KRAS p.G12</strong>)
* Markers reported at the exon level (e.g. <strong>KIT exon 11 mutation</strong>) *
Within CBMDB, only markers collected from FDA/NCCN guidelines,
<p><strong>Clinical biomarkers</strong></p>
<p>Clinical biomarkers included in PCGR is limited to the following:</p>
<ul class="simple">
<li>Markers reported at the variant level (e.g. <strong>BRAF p.V600E</strong>)</li>
<li>Markers reported at the codon level (e.g. <strong>KRAS p.G12</strong>)</li>
<li>Markers reported at the exon level (e.g. <strong>KIT exon 11 mutation</strong>)</li>
<li>Within CBMDB, only markers collected from FDA/NCCN guidelines,
scientific literature and clinical trials are included (markers
collected from conference abstracts are not included)</p>
<p><strong>COSMIC variants</strong> The COSMIC dataset that is part of the PCGR
annotation bundle is the subset of variants that satisfy the following
criteria: * <strong>Mutation somatic status</strong> is either
&#8216;<em>confirmed_somatic</em>&#8216; or
&#8216;<em>reported_in_another_cancer_sample_as_somatic</em>&#8216;. *
<strong>Site/histology</strong> must be known and the sample must come from a
collected from conference abstracts are not included)</li>
</ul>
<p><strong>COSMIC variants</strong></p>
<p>The COSMIC dataset that is part of the PCGR annotation bundle is the
subset of variants that satisfy the following criteria:</p>
<ul class="simple">
<li><strong>Mutation somatic status</strong> is either &#8216;<em>confirmed_somatic</em>&#8216; or
&#8216;<em>reported_in_another_cancer_sample_as_somatic</em>&#8216;.</li>
<li><strong>Site/histology</strong> must be known and the sample must come from a
malignant tumor (i.e. not polyps/adenomas, which are also found in
COSMIC)</p>
COSMIC)</li>
</ul>
</div>
</div>

Expand Down
Loading

0 comments on commit 8d70d8d

Please sign in to comment.