Skip to content

Commit

Permalink
v0.3.2 - bug fix and input check
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed Apr 19, 2017
1 parent 54c7b15 commit 8bc41f1
Show file tree
Hide file tree
Showing 31 changed files with 1,012 additions and 788 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package int
[![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest)


### Annotation resources included in PCGR (v0.3)
### Annotation resources included in PCGR (v0.3.2)

* [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset)
* [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017)
Expand Down Expand Up @@ -53,16 +53,16 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/

#### STEP 2: Download PCGR

<font color="red"><b>April 14th 2017</b>: New release (0.3.1)</font>
<font color="red"><b>April 19th 2017</b>: New release (0.3.2)</font>

1. Download and unpack the [latest release (0.3.1)](https://github.com/sigven/pcgr/releases/latest)
1. Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest)
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.1`)
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.2`)
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`

A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
3. Pull the [PCGR Docker image (0.3.1)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
* `docker pull sigven/pcgr:0.3.1` (PCGR annotation engine)
3. Pull the [PCGR Docker image (0.3.2)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
* `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine)

#### STEP 3: Input preprocessing

Expand Down Expand Up @@ -112,7 +112,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t

positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
e.g. ~/pcgr-0.3.1
e.g. ~/pcgr-0.3.2
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
Expand Down Expand Up @@ -146,7 +146,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t

The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command:

`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.1 ~/pcgr-0.3.1/examples tumor_sample.COAD`
`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`

This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:

Expand Down
Binary file modified docs/_build/doctrees/annotation_resources.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/getting_started.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/output.doctree
Binary file not shown.
6 changes: 3 additions & 3 deletions docs/_build/html/_sources/annotation_resources.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,19 +79,19 @@ A requirement for all variant annotation datasets used in PCGR is that
they have been mapped unambiguously to the human genome (GRCh37). For
most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar
etc.). A significant proportion of variants in the annotation datasets
related to clinical interpretation, CIViC and CBMDB, are however not
related to clinical interpretation, CIViC and CBMDB, is however not
mapped to the genome. Whenever possible, we have utilized
`TransVar <http://bioinformatics.mdanderson.org/transvarweb/>`__ to
identify the actual genomic variants (e.g. *g.chr7:140453136A>T*) that
corresponds to variants reported with other HGVS nomenclature (e.g.
correspond to variants reported with other HGVS nomenclature (e.g.
*p.V600E*).

Other data quality concerns
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Clinical biomarkers**

Clinical biomarkers included in PCGR is limited to the following:
Clinical biomarkers included in PCGR are limited to the following:

- Markers reported at the variant level (e.g. **BRAF p.V600E**)
- Markers reported at the codon level (e.g. **KRAS p.G12**)
Expand Down
14 changes: 7 additions & 7 deletions docs/_build/html/_sources/getting_started.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,18 +42,18 @@ terminal window.
Download PCGR
^^^^^^^^^^^^^

**April 14th 2017**: New release (0.3.1)
**April 19th 2017**: New release (0.3.2)

- Download and unpack the `latest release
(0.3.1) <https://github.com/sigven/pcgr/releases/latest>`__
(0.3.2) <https://github.com/sigven/pcgr/releases/latest>`__

- Download and unpack the data bundle (approx. 17Gb) in the PCGR
directory

- Download `the latest data
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/>`__
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
version number, e.g. ``~/pcgr-0.3.1``)
version number, e.g. ``~/pcgr-0.3.2``)
- Decompress and untar the bundle, e.g. through the following Unix
command:
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``
Expand All @@ -62,10 +62,10 @@ Download PCGR
have been produced

- Pull the `PCGR Docker image -
0.3.1 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
0.3.2 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
(3.1Gb) :

- ``docker pull sigven/pcgr:0.3.1`` (PCGR annotation engine)
- ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine)

Run test - generation of clinical report for a cancer genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script

positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
e.g. ~/pcgr-0.3
e.g. ~/pcgr-0.3.2
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
Expand Down Expand Up @@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be
generated by running the following command in your terminal window:

``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.1 ~/pcgr-0.3.1/examples tumor_sample.COAD``
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD``

This command will run the Docker-based PCGR workflow and produce the
following output files in the *examples* folder:
Expand Down
19 changes: 13 additions & 6 deletions docs/_build/html/_sources/output.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,24 @@ work properly:
`tabix <http://www.htslib.org/doc/tabix.html>`__
- 'chr' must be stripped from the chromosome names

**IMPORTANT NOTE**: Considering the VCF output for the `numerous somatic
SNV/InDel callers <https://www.biostars.org/p/19104/>`__ that have been
developed, we have a experienced a general lack of uniformity and
robustness for the representation of somatic variant genotype data (e.g.
variant allelic depths (tumor/normal), genotype quality etc.). In the
output results provided within the current version of PCGR, we are
**IMPORTANT NOTE 1**: Considering the VCF output for the `numerous
somatic SNV/InDel callers <https://www.biostars.org/p/19104/>`__ that
have been developed, we have a experienced a general lack of uniformity
and robustness for the representation of somatic variant genotype data
(e.g. variant allelic depths (tumor/normal), genotype quality etc.). In
the output results provided within the current version of PCGR, we are
considering PASSed variants only, and variant genotype data (i.e. as
found in the VCF SAMPLE columns) are not handled or parsed. As improved
standards for this matter may emerge, we will strive to include this
information in the annotated output files.

**IMPORTANT NOTE 2**: PCGR generates a number of VCF INFO annotation
tags that is appended to the query VCF. We will therefore encourage the
users to submit query VCF files that have not been subject to
annotations by other means, but rather a VCF file that comes directly
from variant calling. If not, there are likely to be INFO tags in the
query VCF file that coincide with those produced by PCGR.

Copy number segments
^^^^^^^^^^^^^^^^^^^^

Expand Down
6 changes: 3 additions & 3 deletions docs/_build/html/annotation_resources.html
Original file line number Diff line number Diff line change
Expand Up @@ -234,17 +234,17 @@ <h2>Genome mapping<a class="headerlink" href="#genome-mapping" title="Permalink
they have been mapped unambiguously to the human genome (GRCh37). For
most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar
etc.). A significant proportion of variants in the annotation datasets
related to clinical interpretation, CIViC and CBMDB, are however not
related to clinical interpretation, CIViC and CBMDB, is however not
mapped to the genome. Whenever possible, we have utilized
<a class="reference external" href="http://bioinformatics.mdanderson.org/transvarweb/">TransVar</a> to
identify the actual genomic variants (e.g. <em>g.chr7:140453136A&gt;T</em>) that
corresponds to variants reported with other HGVS nomenclature (e.g.
correspond to variants reported with other HGVS nomenclature (e.g.
<em>p.V600E</em>).</p>
</div>
<div class="section" id="other-data-quality-concerns">
<h2>Other data quality concerns<a class="headerlink" href="#other-data-quality-concerns" title="Permalink to this headline"></a></h2>
<p><strong>Clinical biomarkers</strong></p>
<p>Clinical biomarkers included in PCGR is limited to the following:</p>
<p>Clinical biomarkers included in PCGR are limited to the following:</p>
<ul class="simple">
<li>Markers reported at the variant level (e.g. <strong>BRAF p.V600E</strong>)</li>
<li>Markers reported at the codon level (e.g. <strong>KRAS p.G12</strong>)</li>
Expand Down
14 changes: 7 additions & 7 deletions docs/_build/html/getting_started.html
Original file line number Diff line number Diff line change
Expand Up @@ -189,18 +189,18 @@ <h3>Python<a class="headerlink" href="#python" title="Permalink to this headline
</div>
<div class="section" id="download-pcgr">
<h3>Download PCGR<a class="headerlink" href="#download-pcgr" title="Permalink to this headline"></a></h3>
<p><strong>April 14th 2017</strong>: New release (0.3.1)</p>
<p><strong>April 19th 2017</strong>: New release (0.3.2)</p>
<ul>
<li><p class="first">Download and unpack the <a class="reference external" href="https://github.com/sigven/pcgr/releases/latest">latest release
(0.3.1)</a></p>
(0.3.2)</a></p>
</li>
<li><p class="first">Download and unpack the data bundle (approx. 17Gb) in the PCGR
directory</p>
<ul class="simple">
<li>Download <a class="reference external" href="https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/">the latest data
bundle</a>
from Google Drive to <code class="docutils literal"><span class="pre">~/pcgr-X.X</span></code> (replace <em>X.X</em> with the
version number, e.g. <code class="docutils literal"><span class="pre">~/pcgr-0.3.1</span></code>)</li>
version number, e.g. <code class="docutils literal"><span class="pre">~/pcgr-0.3.2</span></code>)</li>
<li>Decompress and untar the bundle, e.g. through the following Unix
command:
<code class="docutils literal"><span class="pre">gzip</span> <span class="pre">-dc</span> <span class="pre">pcgr.databundle.GRCh37.YYYYMMDD.tgz</span> <span class="pre">|</span> <span class="pre">tar</span> <span class="pre">xvf</span> <span class="pre">-</span></code></li>
Expand All @@ -209,10 +209,10 @@ <h3>Download PCGR<a class="headerlink" href="#download-pcgr" title="Permalink to
have been produced</p>
</li>
<li><p class="first">Pull the <a class="reference external" href="https://hub.docker.com/r/sigven/pcgr/">PCGR Docker image -
0.3.1</a> from DockerHub
0.3.2</a> from DockerHub
(3.1Gb) :</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">docker</span> <span class="pre">pull</span> <span class="pre">sigven/pcgr:0.3.1</span></code> (PCGR annotation engine)</li>
<li><code class="docutils literal"><span class="pre">docker</span> <span class="pre">pull</span> <span class="pre">sigven/pcgr:0.3.2</span></code> (PCGR annotation engine)</li>
</ul>
</li>
</ul>
Expand All @@ -236,7 +236,7 @@ <h2>Run test - generation of clinical report for a cancer genome<a class="header

<span class="n">positional</span> <span class="n">arguments</span><span class="p">:</span>
<span class="n">pcgr_dir</span> <span class="n">PCGR</span> <span class="n">base</span> <span class="n">directory</span> <span class="k">with</span> <span class="n">accompanying</span> <span class="n">data</span> <span class="n">directory</span><span class="p">,</span>
<span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span> <span class="o">~/</span><span class="n">pcgr</span><span class="o">-</span><span class="mf">0.3</span>
<span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span> <span class="o">~/</span><span class="n">pcgr</span><span class="o">-</span><span class="mf">0.3</span><span class="o">.</span><span class="mi">2</span>
<span class="n">output_dir</span> <span class="n">Output</span> <span class="n">directory</span>
<span class="n">sample_id</span> <span class="n">Tumor</span> <span class="n">sample</span><span class="o">/</span><span class="n">cancer</span> <span class="n">genome</span> <span class="n">identifier</span> <span class="o">-</span> <span class="n">prefix</span> <span class="k">for</span>
<span class="n">output</span> <span class="n">files</span>
Expand Down Expand Up @@ -272,7 +272,7 @@ <h2>Run test - generation of clinical report for a cancer genome<a class="header
sequenced within TCGA. A report for a colorectal tumor case can be
generated by running the following command in your terminal window:</p>
<p><code class="docutils literal"><span class="pre">python</span> <span class="pre">pcgr.py</span> <span class="pre">--input_vcf</span> <span class="pre">examples/tumor_sample.COAD.vcf.gz</span> <span class="pre">--input_cna_segments</span></code>
<code class="docutils literal"><span class="pre">examples/tumor_sample.COAD.cna.tsv</span> <span class="pre">~/pcgr-0.3.1</span> <span class="pre">~/pcgr-0.3.1/examples</span> <span class="pre">tumor_sample.COAD</span></code></p>
<code class="docutils literal"><span class="pre">examples/tumor_sample.COAD.cna.tsv</span> <span class="pre">~/pcgr-0.3.2</span> <span class="pre">~/pcgr-0.3.2/examples</span> <span class="pre">tumor_sample.COAD</span></code></p>
<p>This command will run the Docker-based PCGR workflow and produce the
following output files in the <em>examples</em> folder:</p>
<ol class="arabic simple">
Expand Down
18 changes: 12 additions & 6 deletions docs/_build/html/output.html
Original file line number Diff line number Diff line change
Expand Up @@ -200,16 +200,22 @@ <h3>VCF<a class="headerlink" href="#vcf" title="Permalink to this headline">¶</
</ul>
</li>
</ol>
<p><strong>IMPORTANT NOTE</strong>: Considering the VCF output for the <a class="reference external" href="https://www.biostars.org/p/19104/">numerous somatic
SNV/InDel callers</a> that have been
developed, we have a experienced a general lack of uniformity and
robustness for the representation of somatic variant genotype data (e.g.
variant allelic depths (tumor/normal), genotype quality etc.). In the
output results provided within the current version of PCGR, we are
<p><strong>IMPORTANT NOTE 1</strong>: Considering the VCF output for the <a class="reference external" href="https://www.biostars.org/p/19104/">numerous
somatic SNV/InDel callers</a> that
have been developed, we have a experienced a general lack of uniformity
and robustness for the representation of somatic variant genotype data
(e.g. variant allelic depths (tumor/normal), genotype quality etc.). In
the output results provided within the current version of PCGR, we are
considering PASSed variants only, and variant genotype data (i.e. as
found in the VCF SAMPLE columns) are not handled or parsed. As improved
standards for this matter may emerge, we will strive to include this
information in the annotated output files.</p>
<p><strong>IMPORTANT NOTE 2</strong>: PCGR generates a number of VCF INFO annotation
tags that is appended to the query VCF. We will therefore encourage the
users to submit query VCF files that have not been subject to
annotations by other means, but rather a VCF file that comes directly
from variant calling. If not, there are likely to be INFO tags in the
query VCF file that coincide with those produced by PCGR.</p>
</div>
<div class="section" id="copy-number-segments">
<h3>Copy number segments<a class="headerlink" href="#copy-number-segments" title="Permalink to this headline"></a></h3>
Expand Down
Loading

0 comments on commit 8bc41f1

Please sign in to comment.