Update pVACsplice documentation

griffithlab · May 30, 2024 · 6374846 · 6374846
1 parent ff3ac39
commit 6374846
Show file tree

Hide file tree

Showing 12 changed files with 927 additions and 20 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -13,6 +13,9 @@ tools:
 **pVACfuse**
    A tool for detecting neoantigens resulting from gene fusions.
 
+**pVACsplice**
+   A tool for detecting neoantigens resulting from splice site variants.
+
 **pVACvector**
    A tool designed to aid specifically in the construction of DNA-based
    cancer vaccines.
@@ -35,6 +38,7 @@ Contents
    pvacseq
    pvacbind
    pvacfuse
+   pvacsplice
    pvacvector
    pvacview
 

diff --git a/docs/pvacfuse/output_files.rst b/docs/pvacfuse/output_files.rst
@@ -22,15 +22,8 @@ created):
 
    * - File Name
      - Description
-   * - ``<sample_name>.tsv``
-     - An intermediate file with variant and transcript information parsed from the input file(s).
-   * - ``<sample_name>.tsv_<chunks>`` (multiple)
-     - The above file but split into smaller chunks for easier processing with IEDB.
    * - ``<sample_name>.fasta``
-     - A fasta file with mutant peptide subsequences for all
-       processable fusion combinations.
-   * - ``<sample_name>.net_chop.fa``
-     - A fasta file with mutant peptide subsequences specific for use in running the net_chop tool.
+     - A fasta file with mutant peptide subsequences for each fusion.
    * - ``<sample_name>.all_epitopes.tsv``
      - A list of all predicted epitopes and their binding affinity scores, with
        additional variant information from the ``<sample_name>.tsv``.
@@ -43,6 +36,10 @@ created):
    * - ``<sample_name>.all_epitopes.aggregated.tsv.reference_matches`` (optional)
      - A file outlining details of reference proteome matches
 
+Additionally, each folder will contain subfolders, one for each selected
+epitope length, that contains intermediate files that are specific to each
+epitope length.
+
 Filters applied to the filtered.tsv file
 ----------------------------------------
 

diff --git a/docs/pvacseq/optional_downstream_analysis_tools.rst b/docs/pvacseq/optional_downstream_analysis_tools.rst
@@ -35,6 +35,9 @@ section of the documentation on how to create this VCF.
 
 The output may be limited to PASS variants only by setting the ``--pass`` only
 flag and to mutant sequences by setting the ``--mutant-only`` flag.
+Additionally, variants can be limited to specific transcript biotypes
+using the ``--biotypes`` parameters, which is set to only include ``protein_coding``
+transcripts by default.
 
 The output can be further limited to only certain variants by providing
 a pVACseq report file to the ``--input-tsv`` argument. Only the peptide sequences for the epitopes in the TSV
@@ -93,7 +96,8 @@ TSV.  In its output, it adds to the TSV 3 columns: Best Cleavage Position, Best
 Cleavage Sites list.  Typically this step is done in the pVACseq run pipeline for the filtered output TSV
 when specified.  This tool provides a way to manually run this on pVACseq's generated filtered/all_epitopes
 TSV files so that you can add this information when not present if desired.
-You can view more about these columns for pVACseq in
+
+You can view more information about these columns for pVACseq in
 the :ref:`output file documentation <all_ep_and_filtered>`.
 
 NetMHCStab Predict Stability
@@ -106,7 +110,8 @@ filtered/all_epitopes TSV.  In its output, it adds to the TSV 4 columns: Predict
 Stability Rank, and NetMHCStab Allele.  Typically this step is done in the pVACseq run pipeline for the
 filtered output TSV when specified.  This tool provides a way to manually run this on pVACseq's generated
 filtered/all_epitopes TSV files so that you can add this information when not present if desired.
-You can view more about these columns for pVACseq in
+
+You can view more information about these columns for pVACseq in
 the :ref:`output file documentation <all_ep_and_filtered>`.
 
 Identify Problematic Amino Acids

diff --git a/docs/pvacseq/output_files.rst b/docs/pvacseq/output_files.rst
@@ -277,10 +277,6 @@ total number of well-scoring epitopes for each variant, the number of
 transcripts covered by those epitopes, as well as the HLA alleles that those
 epitopes are well-binding to. Lastly, the report will bin variants into tiers
 that offer suggestions as to the suitability of variants for use in vaccines.
-Only epitopes meeting the ``--aggregate-inclusion-threshold`` are included in this report (default: 5000).
-Whether the median or the lowest binding affinity metrics are output in the ``IC50 MT``,
-``IC50 WT``, ``%ile MT``, and ``%ile WT`` columns is controlled by the
-``--top-score-metric`` parameter.
 
 Only epitopes meeting the ``--aggregate-inclusion-threshold`` are included in this report (default: 5000).
 Whether the median or the lowest binding affinity metrics are output in the ``IC50 MT``,

diff --git a/docs/pvacsplice.rst b/docs/pvacsplice.rst
@@ -1,11 +1,16 @@
 pVACsplice
-========================
+==========
 
 pVACsplice predicts neoantigens for novel junctions created from tumor-specific alternative splicing patterns.
 
 .. toctree::
    :glob:
 
+   pvacsplice/features
    pvacsplice/input_file_prep
    pvacsplice/getting_started
-   pvacsplice/run
+   pvacsplice/run
+   pvacsplice/output_files
+   pvacsplice/filter_commands
+   pvacsplice/additional_commands
+   pvacsplice/optional_downstream_analysis_tools
diff --git a/docs/pvacsplice/additional_commands.rst b/docs/pvacsplice/additional_commands.rst
@@ -0,0 +1,27 @@
+.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
+    :align: right
+    :alt: pVACseq logo
+
+Additional Commands
+===================
+
+To make using pVACsplice easier, several convenience methods are included in the package.
+
+.. _pvacsplice_example_data:
+
+Download Example Data
+---------------------
+
+.. program-output:: pvacsplice download_example_data -h
+
+.. _pvacsplice_valid_alleles:
+
+List Valid Alleles
+------------------
+
+.. program-output:: pvacsplice valid_alleles -h
+
+List Allele-Specific Cutoffs
+----------------------------
+
+.. program-output:: pvacsplice allele_specific_cutoffs -h
diff --git a/docs/pvacsplice/features.rst b/docs/pvacsplice/features.rst
@@ -0,0 +1,140 @@
+.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
+    :align: right
+    :alt: pVACsplice logo
+
+Features
+========
+
+**Splice Site Analysis**
+
+pVACsplice offers epitope binding predictions for splice site variants
+predicted by RegTools.
+
+**No local install of epitope prediction software needed**
+
+pVACsplice utilizes the IEDB RESTful web interface. This means that none of the underlying prediction software, like NetMHC, needs to be installed locally.
+
+.. warning::
+   We only recommend using the RESTful API for small requests. If you use the
+   RESTful API to process large VCFs or to make predictions for many alleles,
+   epitope lengths, or prediction algorithms, you might overload their system.
+   This can result in the blacklisting of your IP address by IEDB, causing
+   403 errors when trying to use the RESTful API. In that case please open
+   a ticket with `IEDB support <http://help.iedb.org/>`_ to have your IP
+   address removed from the IEDB blacklist.
+
+**Support for local installation of the IEDB Analysis Resources**
+
+pVACsplice provides the option of using a local installation of the IEDB MHC
+`class I <http://tools.iedb.org/mhci/download/>`_ and `class II <http://tools.iedb.org/mhcii/download/>`_
+binding prediction tools.
+
+.. warning::
+   Using a local IEDB installation is strongly recommended for larger datasets
+   or when the making predictions for many alleles, epitope lengths, or
+   prediction algorithms. More information on how to install IEDB locally can
+   be found on the :ref:`Installation <iedb_install>` page (note: the pvactools 
+   docker image now contains IEDB).
+
+**MHC Class I and Class II predictions**
+
+Both MHC Class I and Class II predictions are supported. Simply choose the desired
+prediction algorithms and HLA alleles during processing and Class I and Class II
+prediction results will be written to their own respective subdirectories in your
+output directory. pVACsplice supports binding affinity algorithms as well as elution
+algortihms.
+
+By using the IEDB RESTful web interface, pVACsplice leverages their extensive support of different prediction algorithms.
+
+In addition to IEDB-supported prediction algorithms, we've also added support
+for `MHCflurry <http://www.biorxiv.org/content/early/2017/08/09/174243>`_ and
+`MHCnuggets <http://karchinlab.org/apps/appMHCnuggets.html>`_.
+
+================================================= ======= ========================
+MHC Class I Binding Affinity Prediction Algorithm Version Supports Percentile Rank
+================================================= ======= ========================
+NetMHCpan                                         4.1     yes
+NetMHC                                            4.0     yes
+NetMHCcons                                        1.1     yes
+PickPocket                                        1.1     yes
+SMM                                               1.0     yes
+SMMPMBEC                                          1.0     yes
+MHCflurry                                                 yes
+MHCnuggets                                                no
+================================================= ======= ========================
+
+================================================== ======= ========================
+MHC Class II Binding Affinity Prediction Algorithm Version Supports Percentile Rank
+================================================== ======= ========================
+NetMHCIIpan                                        4.1     yes
+SMMalign                                           1.1     yes
+NNalign                                            2.3     yes
+MHCnuggets                                                 no
+================================================== ======= ========================
+
+======================================== ======= ========================
+MHC Class I Elution Prediction Algorithm Version Supports Percentile Rank
+======================================== ======= ========================
+NetMHCpanEL                              4.1     yes
+MHCflurryEL                                      | Processing Score: no;
+                                                 | Presentation Score: yes
+BigMHC_EL                                        no
+======================================== ======= ========================
+
+========================================= ======= ========================
+MHC Class II Elution Prediction Algorithm Version Supports Percentile Rank
+========================================= ======= ========================
+NetMHCIIpanEL                             4.1     yes
+========================================= ======= ========================
+
+=============================================== ======= ========================
+MHC Class I Immunogenicity Prediction Algorithm Version Supports Percentile Rank
+=============================================== ======= ========================
+BigMHC_IM                                               no
+DeepImmuno                                              no
+=============================================== ======= ========================
+
+**Comprehensive filtering**
+
+Automatic filtering on the binding affinity ic50 (nm) value narrows down the results to only include
+"good" candidate peptides. The binding filter threshold can be adjusted by the user for each
+pVACsplice run. pVACsplice also support the option of filtering on allele-specific binding thresholds
+as recommended by `IEDB <https://help.iedb.org/hc/en-us/articles/114094151811-Selecting-thresholds-cut-offs-for-MHC-class-I-and-II-binding-predictions>`_
+as well as percentile ranks.
+Additional filtering on the binding affitinity can be manually done by the user by running the
+:ref:`standalone binding filter <pvacsplice_filter_commands>` on the filtered result file
+to narrow down the candidate epitopes even further or on the unfiltered
+all_epitopes file to apply different cutoffs.
+
+Readcount and expression data are extracted from an annotated VCF to automatically filter with
+adjustable thresholds on depth, VAF, and/or expression values. The user can also manually run
+the :ref:`standalone coverage filter <pvacsplice_filter_commands>` to further narrow down their results
+from the filtered output file.
+
+pVACsplice will filter on the transcript support level to only keep high-confidence
+transcripts of level 1. This filter can also be run :ref:`standalone
+<pvacsplice_filter_commands>`.
+
+As a last filtering step, pVACsplice applies the top score filter to only keep the top scoring epitope
+for each variant. As with all previous filters, this filter can also be run
+:ref:`standalone <pvacsplice_filter_commands>`. Please also see that section for more
+details about how the top scoring epitope is determines.
+
+**NetChop and NetMHCstab integration**
+
+Cleavage position predictions are added with optional processing through NetChop.
+
+Stability predictions can be added if desired by the user. These predictions are obtained via NetMHCstabpan.
+
+**Reference proteome similarity analysis**
+
+This optional feature will search for an epitope in the reference proteome
+using BLAST or a reference proteome FASTA file to determine if the epitope occurs elsewhere in the proteome and
+is, therefore, not tumor-specific.
+
+**Problematic amino acids**
+
+This optional feature allows users to specify a list of amino acids that would
+be considered problematic to occur either everywhere or at specific positions
+in a neoepitope. This can be useful when certain amino acids would be
+problematic during peptide manufacturing.
diff --git a/docs/pvacsplice/filter_commands.rst b/docs/pvacsplice/filter_commands.rst
@@ -0,0 +1,110 @@
+.. .. image:: ../images/pVACseq_logo_trans-bg_sm_v4b.png
+    :align: right
+    :alt: pVACseq logo
+
+.. _pvacsplice_filter_commands:
+
+Filtering Commands
+==================
+
+pVACsplice currently offers four filters: a binding filter, a coverage filter,
+a transcript support level filter, and a top score filter.
+
+These filters are always run automatically as part
+of the pVACsplice pipeline using default cutoffs.
+
+All filters can also be run manually on the filtered.tsv file to narrow the results down further,
+or they can be run on the all_epitopes.tsv file to apply different filtering thresholds.
+
+The binding filter is used to remove neoantigen candidates that do not meet desired peptide:MHC binding criteria.
+The coverage filter is used to remove variants that do not meet desired read count and VAF criteria (in normal DNA
+and tumor DNA/RNA). The transcript support level filter is used to remove variant annotations based on low quality
+transcript annotations. The top score filter is used to select the most promising peptide candidate for each variant.
+Multiple candidate peptides from a single somatic variant can be caused by multiple peptide lengths, registers, HLA alleles,
+and transcript annotations.
+
+Further details on each of these filters is provided below.
+
+.. note::
+
+   The default values for filtering thresholds are suggestions only. While they are based on review of the literature
+   and consultation with our clinical and immunology colleagues, your specific use case will determine the appropriate values.
+
+Binding Filter
+--------------
+
+.. program-output:: pvacsplice binding_filter -h
+
+The binding filter removes variants that don't pass the chosen binding threshold.
+The user can chose whether to apply this filter to the ``lowest`` or the ``median`` binding
+affinity score by setting the ``--top-score-metric`` flag. The ``lowest`` binding
+affinity score is recorded in the ``Best MT IC50 Score`` column and represents the lowest
+ic50 score of all prediction algorithms that were picked during the previous pVACseq run.
+The ``median`` binding affinity score is recorded in the ``Median MT IC50 Score`` column and
+corresponds to the median ic50 score of all prediction algorithms used to create the report.
+Be default, the binding filter runs on the ``median`` binding affinity.
+
+When the ``--allele-specific-binding-thresholds`` flag is set, binding cutoffs specific to each
+prediction's HLA allele are used instead of the value set via the ``--binding-threshold`` parameters.
+For HLA alleles where no allele-specific binding threshold is available, the
+binding threshold is used as a fallback. Alleles with allele-specific
+threshold as well as the value of those thresholds can be printed by executing
+the ``pvacsplice allele_specific_cutoffs`` command.
+
+In addition to being able to filter on the IC50 score columns, the binding
+filter also offers the ability to filter on the percentile score using the
+``--percentile-threshold`` parameter. When the ``--top-score-metric`` is set
+to ``lowest``, this threshold is applied to the ``Best MT Percentile`` column. When
+it is set to ``median``, the threshold is applied to the ``Median MT
+Percentile`` column.
+
+By default, entries with ``NA`` values will be included in the output. This
+behavior can be turned off by using the ``--exclude-NAs`` flag.
+
+Coverage Filter
+---------------
+
+.. program-output:: pvacsplice coverage_filter -h
+
+If the pVACsplice input VCF contains readcount and/or expression annotations, then the coverage filter
+can be run again on the filtered.tsv report file to narrow down the results even further.
+You can also run this filter again on the all_epitopes.tsv report file to apply different cutoffs.
+
+The general goals of these filters are to limit variants for neoepitope prediction to those
+with good read support and/or remove possible sub-clonal variants. In some cases the input
+VCF may have already been filtered in this fashion. This filter also allows for removal of
+variants that do not have sufficient evidence of RNA expression.
+
+For more details on how to prepare input VCFs that contain all of these annotations, refer to
+the :ref:`pvacsplice_prerequisites_label` section for more information.
+
+By default, entries with ``NA`` values will be included in the output. This
+behavior can be turned off by using the ``--exclude-NAs`` flag.
+
+Transcript Support Level Filter
+-------------------------------
+
+.. program-output:: pvacsplice transcript_support_level_filter -h
+
+This filter is used to eliminate variant annotations based on poorly-supported transcripts. By default,
+only transcripts with a `transcript support level (TSL) <https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl>`_
+of <=1 are kept. This threshold can be adjusted using the ``--maximum-transcript-support-level``
+parameter.
+
+By default, entries with ``Not Supported`` values will be included in the output. These occur if VEP was run
+without the ``--tsl`` flag or if data is aligned to GRCh37 or older.
+
+Top Score Filter
+----------------
+
+.. program-output:: pvacsplice top_score_filter -h
+
+This filter picks the top epitope for each splice site variant. The top epitope is
+determined by first selecting epitopes with no Problematic Positions
+and among those selecting the one with lowest median/best MT IC50 score for
+each splice site variant
+
+By default the ``--top-score-metric`` option is set to ``median`` which will apply this
+filter to the ``Median MT IC50 Score`` column. If the ``--top-score-metric``
+option is set to ``lowest``, the ``Best MT IC50  Score`` column is used
+instead.