Skip to content

Commit

Permalink
Update pVACsplice documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
susannasiebert committed May 30, 2024
1 parent ff3ac39 commit 6374846
Show file tree
Hide file tree
Showing 12 changed files with 927 additions and 20 deletions.
4 changes: 4 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ tools:
**pVACfuse**
A tool for detecting neoantigens resulting from gene fusions.

**pVACsplice**
A tool for detecting neoantigens resulting from splice site variants.

**pVACvector**
A tool designed to aid specifically in the construction of DNA-based
cancer vaccines.
Expand All @@ -35,6 +38,7 @@ Contents
pvacseq
pvacbind
pvacfuse
pvacsplice
pvacvector
pvacview

Expand Down
13 changes: 5 additions & 8 deletions docs/pvacfuse/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,8 @@ created):

* - File Name
- Description
* - ``<sample_name>.tsv``
- An intermediate file with variant and transcript information parsed from the input file(s).
* - ``<sample_name>.tsv_<chunks>`` (multiple)
- The above file but split into smaller chunks for easier processing with IEDB.
* - ``<sample_name>.fasta``
- A fasta file with mutant peptide subsequences for all
processable fusion combinations.
* - ``<sample_name>.net_chop.fa``
- A fasta file with mutant peptide subsequences specific for use in running the net_chop tool.
- A fasta file with mutant peptide subsequences for each fusion.
* - ``<sample_name>.all_epitopes.tsv``
- A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the ``<sample_name>.tsv``.
Expand All @@ -43,6 +36,10 @@ created):
* - ``<sample_name>.all_epitopes.aggregated.tsv.reference_matches`` (optional)
- A file outlining details of reference proteome matches

Additionally, each folder will contain subfolders, one for each selected
epitope length, that contains intermediate files that are specific to each
epitope length.

Filters applied to the filtered.tsv file
----------------------------------------

Expand Down
9 changes: 7 additions & 2 deletions docs/pvacseq/optional_downstream_analysis_tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ section of the documentation on how to create this VCF.

The output may be limited to PASS variants only by setting the ``--pass`` only
flag and to mutant sequences by setting the ``--mutant-only`` flag.
Additionally, variants can be limited to specific transcript biotypes
using the ``--biotypes`` parameters, which is set to only include ``protein_coding``
transcripts by default.

The output can be further limited to only certain variants by providing
a pVACseq report file to the ``--input-tsv`` argument. Only the peptide sequences for the epitopes in the TSV
Expand Down Expand Up @@ -93,7 +96,8 @@ TSV. In its output, it adds to the TSV 3 columns: Best Cleavage Position, Best
Cleavage Sites list. Typically this step is done in the pVACseq run pipeline for the filtered output TSV
when specified. This tool provides a way to manually run this on pVACseq's generated filtered/all_epitopes
TSV files so that you can add this information when not present if desired.
You can view more about these columns for pVACseq in

You can view more information about these columns for pVACseq in
the :ref:`output file documentation <all_ep_and_filtered>`.

NetMHCStab Predict Stability
Expand All @@ -106,7 +110,8 @@ filtered/all_epitopes TSV. In its output, it adds to the TSV 4 columns: Predict
Stability Rank, and NetMHCStab Allele. Typically this step is done in the pVACseq run pipeline for the
filtered output TSV when specified. This tool provides a way to manually run this on pVACseq's generated
filtered/all_epitopes TSV files so that you can add this information when not present if desired.
You can view more about these columns for pVACseq in

You can view more information about these columns for pVACseq in
the :ref:`output file documentation <all_ep_and_filtered>`.

Identify Problematic Amino Acids
Expand Down
4 changes: 0 additions & 4 deletions docs/pvacseq/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -277,10 +277,6 @@ total number of well-scoring epitopes for each variant, the number of
transcripts covered by those epitopes, as well as the HLA alleles that those
epitopes are well-binding to. Lastly, the report will bin variants into tiers
that offer suggestions as to the suitability of variants for use in vaccines.
Only epitopes meeting the ``--aggregate-inclusion-threshold`` are included in this report (default: 5000).
Whether the median or the lowest binding affinity metrics are output in the ``IC50 MT``,
``IC50 WT``, ``%ile MT``, and ``%ile WT`` columns is controlled by the
``--top-score-metric`` parameter.

Only epitopes meeting the ``--aggregate-inclusion-threshold`` are included in this report (default: 5000).
Whether the median or the lowest binding affinity metrics are output in the ``IC50 MT``,
Expand Down
9 changes: 7 additions & 2 deletions docs/pvacsplice.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
pVACsplice
========================
==========

pVACsplice predicts neoantigens for novel junctions created from tumor-specific alternative splicing patterns.

.. toctree::
:glob:

pvacsplice/features
pvacsplice/input_file_prep
pvacsplice/getting_started
pvacsplice/run
pvacsplice/run
pvacsplice/output_files
pvacsplice/filter_commands
pvacsplice/additional_commands
pvacsplice/optional_downstream_analysis_tools
27 changes: 27 additions & 0 deletions docs/pvacsplice/additional_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACseq logo
Additional Commands
===================

To make using pVACsplice easier, several convenience methods are included in the package.

.. _pvacsplice_example_data:

Download Example Data
---------------------

.. program-output:: pvacsplice download_example_data -h

.. _pvacsplice_valid_alleles:

List Valid Alleles
------------------

.. program-output:: pvacsplice valid_alleles -h

List Allele-Specific Cutoffs
----------------------------

.. program-output:: pvacsplice allele_specific_cutoffs -h
140 changes: 140 additions & 0 deletions docs/pvacsplice/features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACsplice logo
Features
========

**Splice Site Analysis**

pVACsplice offers epitope binding predictions for splice site variants
predicted by RegTools.

**No local install of epitope prediction software needed**

pVACsplice utilizes the IEDB RESTful web interface. This means that none of the underlying prediction software, like NetMHC, needs to be installed locally.

.. warning::
We only recommend using the RESTful API for small requests. If you use the
RESTful API to process large VCFs or to make predictions for many alleles,
epitope lengths, or prediction algorithms, you might overload their system.
This can result in the blacklisting of your IP address by IEDB, causing
403 errors when trying to use the RESTful API. In that case please open
a ticket with `IEDB support <http://help.iedb.org/>`_ to have your IP
address removed from the IEDB blacklist.

**Support for local installation of the IEDB Analysis Resources**

pVACsplice provides the option of using a local installation of the IEDB MHC
`class I <http://tools.iedb.org/mhci/download/>`_ and `class II <http://tools.iedb.org/mhcii/download/>`_
binding prediction tools.

.. warning::
Using a local IEDB installation is strongly recommended for larger datasets
or when the making predictions for many alleles, epitope lengths, or
prediction algorithms. More information on how to install IEDB locally can
be found on the :ref:`Installation <iedb_install>` page (note: the pvactools
docker image now contains IEDB).

**MHC Class I and Class II predictions**

Both MHC Class I and Class II predictions are supported. Simply choose the desired
prediction algorithms and HLA alleles during processing and Class I and Class II
prediction results will be written to their own respective subdirectories in your
output directory. pVACsplice supports binding affinity algorithms as well as elution
algortihms.

By using the IEDB RESTful web interface, pVACsplice leverages their extensive support of different prediction algorithms.

In addition to IEDB-supported prediction algorithms, we've also added support
for `MHCflurry <http://www.biorxiv.org/content/early/2017/08/09/174243>`_ and
`MHCnuggets <http://karchinlab.org/apps/appMHCnuggets.html>`_.

================================================= ======= ========================
MHC Class I Binding Affinity Prediction Algorithm Version Supports Percentile Rank
================================================= ======= ========================
NetMHCpan 4.1 yes
NetMHC 4.0 yes
NetMHCcons 1.1 yes
PickPocket 1.1 yes
SMM 1.0 yes
SMMPMBEC 1.0 yes
MHCflurry yes
MHCnuggets no
================================================= ======= ========================

================================================== ======= ========================
MHC Class II Binding Affinity Prediction Algorithm Version Supports Percentile Rank
================================================== ======= ========================
NetMHCIIpan 4.1 yes
SMMalign 1.1 yes
NNalign 2.3 yes
MHCnuggets no
================================================== ======= ========================

======================================== ======= ========================
MHC Class I Elution Prediction Algorithm Version Supports Percentile Rank
======================================== ======= ========================
NetMHCpanEL 4.1 yes
MHCflurryEL | Processing Score: no;
| Presentation Score: yes
BigMHC_EL no
======================================== ======= ========================

========================================= ======= ========================
MHC Class II Elution Prediction Algorithm Version Supports Percentile Rank
========================================= ======= ========================
NetMHCIIpanEL 4.1 yes
========================================= ======= ========================

=============================================== ======= ========================
MHC Class I Immunogenicity Prediction Algorithm Version Supports Percentile Rank
=============================================== ======= ========================
BigMHC_IM no
DeepImmuno no
=============================================== ======= ========================

**Comprehensive filtering**

Automatic filtering on the binding affinity ic50 (nm) value narrows down the results to only include
"good" candidate peptides. The binding filter threshold can be adjusted by the user for each
pVACsplice run. pVACsplice also support the option of filtering on allele-specific binding thresholds
as recommended by `IEDB <https://help.iedb.org/hc/en-us/articles/114094151811-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions>`_
as well as percentile ranks.
Additional filtering on the binding affitinity can be manually done by the user by running the
:ref:`standalone binding filter <pvacsplice_filter_commands>` on the filtered result file
to narrow down the candidate epitopes even further or on the unfiltered
all_epitopes file to apply different cutoffs.

Readcount and expression data are extracted from an annotated VCF to automatically filter with
adjustable thresholds on depth, VAF, and/or expression values. The user can also manually run
the :ref:`standalone coverage filter <pvacsplice_filter_commands>` to further narrow down their results
from the filtered output file.

pVACsplice will filter on the transcript support level to only keep high-confidence
transcripts of level 1. This filter can also be run :ref:`standalone
<pvacsplice_filter_commands>`.

As a last filtering step, pVACsplice applies the top score filter to only keep the top scoring epitope
for each variant. As with all previous filters, this filter can also be run
:ref:`standalone <pvacsplice_filter_commands>`. Please also see that section for more
details about how the top scoring epitope is determines.

**NetChop and NetMHCstab integration**

Cleavage position predictions are added with optional processing through NetChop.

Stability predictions can be added if desired by the user. These predictions are obtained via NetMHCstabpan.

**Reference proteome similarity analysis**

This optional feature will search for an epitope in the reference proteome
using BLAST or a reference proteome FASTA file to determine if the epitope occurs elsewhere in the proteome and
is, therefore, not tumor-specific.

**Problematic amino acids**

This optional feature allows users to specify a list of amino acids that would
be considered problematic to occur either everywhere or at specific positions
in a neoepitope. This can be useful when certain amino acids would be
problematic during peptide manufacturing.
110 changes: 110 additions & 0 deletions docs/pvacsplice/filter_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACseq logo
.. _pvacsplice_filter_commands:

Filtering Commands
==================

pVACsplice currently offers four filters: a binding filter, a coverage filter,
a transcript support level filter, and a top score filter.

These filters are always run automatically as part
of the pVACsplice pipeline using default cutoffs.

All filters can also be run manually on the filtered.tsv file to narrow the results down further,
or they can be run on the all_epitopes.tsv file to apply different filtering thresholds.

The binding filter is used to remove neoantigen candidates that do not meet desired peptide:MHC binding criteria.
The coverage filter is used to remove variants that do not meet desired read count and VAF criteria (in normal DNA
and tumor DNA/RNA). The transcript support level filter is used to remove variant annotations based on low quality
transcript annotations. The top score filter is used to select the most promising peptide candidate for each variant.
Multiple candidate peptides from a single somatic variant can be caused by multiple peptide lengths, registers, HLA alleles,
and transcript annotations.

Further details on each of these filters is provided below.

.. note::

The default values for filtering thresholds are suggestions only. While they are based on review of the literature
and consultation with our clinical and immunology colleagues, your specific use case will determine the appropriate values.

Binding Filter
--------------

.. program-output:: pvacsplice binding_filter -h

The binding filter removes variants that don't pass the chosen binding threshold.
The user can chose whether to apply this filter to the ``lowest`` or the ``median`` binding
affinity score by setting the ``--top-score-metric`` flag. The ``lowest`` binding
affinity score is recorded in the ``Best MT IC50 Score`` column and represents the lowest
ic50 score of all prediction algorithms that were picked during the previous pVACseq run.
The ``median`` binding affinity score is recorded in the ``Median MT IC50 Score`` column and
corresponds to the median ic50 score of all prediction algorithms used to create the report.
Be default, the binding filter runs on the ``median`` binding affinity.

When the ``--allele-specific-binding-thresholds`` flag is set, binding cutoffs specific to each
prediction's HLA allele are used instead of the value set via the ``--binding-threshold`` parameters.
For HLA alleles where no allele-specific binding threshold is available, the
binding threshold is used as a fallback. Alleles with allele-specific
threshold as well as the value of those thresholds can be printed by executing
the ``pvacsplice allele_specific_cutoffs`` command.

In addition to being able to filter on the IC50 score columns, the binding
filter also offers the ability to filter on the percentile score using the
``--percentile-threshold`` parameter. When the ``--top-score-metric`` is set
to ``lowest``, this threshold is applied to the ``Best MT Percentile`` column. When
it is set to ``median``, the threshold is applied to the ``Median MT
Percentile`` column.

By default, entries with ``NA`` values will be included in the output. This
behavior can be turned off by using the ``--exclude-NAs`` flag.

Coverage Filter
---------------

.. program-output:: pvacsplice coverage_filter -h

If the pVACsplice input VCF contains readcount and/or expression annotations, then the coverage filter
can be run again on the filtered.tsv report file to narrow down the results even further.
You can also run this filter again on the all_epitopes.tsv report file to apply different cutoffs.

The general goals of these filters are to limit variants for neoepitope prediction to those
with good read support and/or remove possible sub-clonal variants. In some cases the input
VCF may have already been filtered in this fashion. This filter also allows for removal of
variants that do not have sufficient evidence of RNA expression.

For more details on how to prepare input VCFs that contain all of these annotations, refer to
the :ref:`pvacsplice_prerequisites_label` section for more information.

By default, entries with ``NA`` values will be included in the output. This
behavior can be turned off by using the ``--exclude-NAs`` flag.

Transcript Support Level Filter
-------------------------------

.. program-output:: pvacsplice transcript_support_level_filter -h

This filter is used to eliminate variant annotations based on poorly-supported transcripts. By default,
only transcripts with a `transcript support level (TSL) <https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl>`_
of <=1 are kept. This threshold can be adjusted using the ``--maximum-transcript-support-level``
parameter.

By default, entries with ``Not Supported`` values will be included in the output. These occur if VEP was run
without the ``--tsl`` flag or if data is aligned to GRCh37 or older.

Top Score Filter
----------------

.. program-output:: pvacsplice top_score_filter -h

This filter picks the top epitope for each splice site variant. The top epitope is
determined by first selecting epitopes with no Problematic Positions
and among those selecting the one with lowest median/best MT IC50 score for
each splice site variant

By default the ``--top-score-metric`` option is set to ``median`` which will apply this
filter to the ``Median MT IC50 Score`` column. If the ``--top-score-metric``
option is set to ``lowest``, the ``Best MT IC50 Score`` column is used
instead.
Loading

0 comments on commit 6374846

Please sign in to comment.