Skip to content

Commit

Permalink
incorporate suggesstions from Chris Miller (on top of suggestions fro…
Browse files Browse the repository at this point in the history
…m Malachi and Susanna)
  • Loading branch information
mhoang22 committed May 3, 2024
1 parent 8d3ac09 commit 1934b6b
Showing 1 changed file with 30 additions and 17 deletions.
47 changes: 30 additions & 17 deletions docs/pvacview/pvacseq_module/pvacseq_vignette.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

Vignette
---------------
In this vignette, we will demonstrate evaluation of neoantigen candidates predicted by pVACseq with pVACview using the built-in demo data. The demonstration dataset includes Class I and Class II neoantigen candidate files generated from the HCC1395 breast cancer cell line and its matched lymphoblastoid cell line HCC1395BL. You can also download the demo data `here <https://github.com/griffithlab/pVACtools/tree/master/pvactools/tools/pvacview/data>`_.
In this vignette, we will use demo data to demonstrate using pVACview to evaluate neoantigen candidates predicted by pVACseq. The demonstration dataset includes Class I and Class II neoantigen candidate files generated from the HCC1395 breast cancer cell line and its matched lymphoblastoid cell line HCC1395BL. You can also download the demo data `here <https://github.com/griffithlab/pVACtools/tree/master/pvactools/tools/pvacview/data>`_.

:large:`Upload input data files`
____________________________
Expand Down Expand Up @@ -50,26 +50,37 @@ The top row of the page has 4 sections:
- Current Parameters for Tiering
- Add Comments for selected variant

pVACview prioritizes neoantigen candidates by ranking these peptides based on a set of rules (parameters for tiering), which include variant allele fraction cutoff, gene/transcript expression, binding affinity predictions and more, as discussed later. Based on routine criteria described in the literature, we provide a default set of parameters for tiering, detailed in the **Original Parameters for Tiering** section. The default is a good starting point, but as all samples are unique in terms of sample quality, sequencing quality, tumor purity, tumor mutation burden, HLA type, etc. you may also want to set your own parameters in the **Advanced Options: Regenerate Tiering with different parameters** section. To see the current set of rules applied to your data, see the **Current Parameters for Tiering** section.
pVACview prioritizes neoantigen candidates by ranking these peptides based on a set of rules (parameters for tiering), which include variant allele fraction cutoff, gene/transcript expression, binding affinity predictions and more, as discussed later. Based on criteria described in the literature, we provide a default set of parameters for tiering, detailed in the **Original Parameters for Tiering** section. The default is a good starting point, but as all samples are unique in terms of sample quality, sequencing quality, tumor purity, tumor mutation burden, HLA type, etc. you may want to set your own parameters in the **Advanced Options: Regenerate Tiering with different parameters** section. To see the current set of rules applied to your data, see the **Current Parameters for Tiering** section.
Note: click the ``+``/ ``-`` in the right corner to expand/contract each section.

The original parameters rank candidates on multiple facets.
The first aspect is clonality. Cancer starts with a founding clone with tumor-initiating mutations which expand and drive malignancy. Descendents of the founding clone may acquire additional mutations. Clonal mutations are shared by all clones, whereas subclonal mutations are shared by some but not all cancer cells. Neoantigen candidates derived from clonal variants should be prioritized as it has been proposed that targeting such mutations will drive a better clinical response. pVACview uses the following parameters when determining clonality:
The original parameters rank candidates on multiple facets:

**Clonality**

Cancer starts with a founding clone with tumor-initiating mutations which expand and drive malignancy. Descendents of the founding clone may acquire additional mutations. The default tiering assumes that neoantigen candidates derived from clonal variants should be prioritized as they exist in every cell of the tumor, while subclonal mutations are shared by some but not all of the cancer cells. It has been proposed that targeting such mutations will drive a better clinical response.

pVACview uses the following parameters when determining clonality:

- ``Tumor Purity`` : a value between 0 and 1 indicating the fraction of tumor cells in the tumor sample. (default: None)
- ``VAF Clonal`` : Tumor DNA variant allele frequency (VAF) to determine whether the variant is clonal. By default, this value is determined automatically from the VAFs in the input data during the original pVACseq run unless the tumor purity parameter is set (see pVACseq docs for further details). This can be adjusted by the user in pVACview (see below).
- ``VAF Subclonal`` : Tumor DNA VAF cutoff to determine whether the variant is subclonal. This value is automatically calculated as half of ``VAF Clonal``.

The second aspect is expression. The ideal peptide candidate should be derived from a gene/transcript that is expressed robustly. We calculate allele expression by multiplying gene expression by the RNA VAF and set a default cutoff of 2.5. Variants with expression lower than this cutoff will be marked with low expression. Users can adjust this cutoff based on their own knowledge of the dataset being analyzed:
**Expression**

The ideal peptide candidate should be derived from a gene/transcript that is expressed robustly. We calculate allele expression by multiplying gene expression (often TPM or FPKM) by the RNA VAF and set a default cutoff of 2.5. Variants with expression lower than this cutoff will be marked with low expression. Users can adjust this cutoff based on their own knowledge of the dataset being analyzed:

- ``Allele Expression for Passing Variants`` : allele expression cutoff for passing variants. (default: 2.5 FPKM*VAF)

The third aspect is predicted binding affinity, which is measured by IC50 (peptide concentration required for 50% of displacement of a reference peptide to an MHC groove). Lower IC50 means a lower peptide concentration was required to achieve 50% displacement, which signifies better binding affinity. A common threshold for considering a peptide to be a strong binder is 500 nM. We also list the `Binding threshold` for inclusion in the Metric File. This parameter determines how many peptides the user wants to include in the peptide detailed view. Note that this parameter cannot be changed in the visualization component of pVACview but would need to be changed when generating the original aggregate report and metrics file. The default cutoff was set to 5000 nM to reasonably capture information about different peptide candidates from the same mutation but also to exclude those that have extremely poor binding.
**Predicted Binding Affinity**

Binding affinity is measured by IC50 (peptide concentration required for 50% of displacement of a reference peptide to an MHC groove). Lower IC50 means a lower peptide concentration was required to achieve 50% displacement, which signifies better binding affinity. A common threshold for considering a peptide to be a strong binder is 500 nM. We also list the `Binding threshold` for inclusion in the Metric File. This parameter determines how many peptides the user wants to include in the peptide detailed view. Note that this parameter cannot be changed in the visualization component of pVACview but would need to be changed when generating the original aggregate report and metrics file. The default cutoff was set to 5000 nM to reasonably capture information about different peptide candidates from the same mutation but also to exclude those that have extremely poor binding.

- ``Binding Threshold``: IC50 value cutoff for a passing neoantigen. (default: 500 nM)
- ``Binding Threshold for Inclusion Into Metric File``: IC50 value cutoff for neoantigens to be loaded to pVACview. This feature helps limit the number of neoantigens being loaded to pVACview. (default: 5000 nM)

The fourth aspect is Transcript Support Level (`TSL <https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html>`_). TSL highlights which transcript isoform is well/poorly-supported by alignment. The existing TSL levels are: TSL1, TSL2, TSL3, TSL4, TSL5, TSLNA, with TSL1 being the best TSL level. We suggest users using a higher TSL level cutoff (lower number) for higher confidence in the annotation of the targeted transcript. Default is set to be TSL1.
**Transcript Support Level**

(`TSL <https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html>`_) provides information on degree to which transcript isoforms are supported by experimental evidence. The existing TSL levels are: TSL1, TSL2, TSL3, TSL4, TSL5, TSLNA, with TSL1 being the best TSL level. We suggest users using a higher TSL level cutoff (lower number) for higher confidence in the annotation of the targeted transcript. Default is set to be TSL1.

- ``Maximum TSL`` : cutoff TSL level for a passing candidate. (default: 1)

Expand All @@ -87,6 +98,8 @@ An important advantage of using pVACseq to generate neoantigen predictions is th
- ``MT Top Score Metric`` : mutant top score metric. (default: Median)
- ``WT Top Score Metric`` : wildtype top score metric. (default: Median)

**Anchor Positions**

Anchor positions can influence whether a neoantigen candidate may be recognized by the patient’s immune system. Thus, another aspect to consider is anchor contribution. A subset of amino acid positions within the neoantigen candidate is more likely to face the TCR, while other positions are responsible for anchoring the peptide to the MHC. Anchor identity is determined by anchor likelihood score (more information about how the score is calculated `here <https://www.science.org/doi/10.1126/sciimmunol.abg2200?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed>`_). Anchor identity of the mutated amino acid can influence whether the neoantigen candidate is expected to induce an immune response or be subject to central tolerance of the immune system, as elaborated in the 4 scenarios discussed later. To examine whether the mutated amino acid is located at anchor position, we provide:

- ``Allele Specific Anchors Used`` : if TRUE, likelihood score is used to determine anchor position; if FALSE, position 1, 2, n-1 and n are set as anchor positions. (default: TRUE)
Expand Down Expand Up @@ -125,7 +138,7 @@ To set your own Tier-setting parameters, expand the **Advanced Options: Regenera
and tailor the parameters as needed. Learn more about Advanced Options `here <https://pvactools.readthedocs.io/en/stable/pvacview/getting_started.html#regenerate-tiering>`_.


The second row of the page spans the **Aggregate Report of Best Candidates by Variant** section, which lists all neoantigen candidates in provided input. Candidates with higher Tier will be shown first, followed by candidates of lower Tiers (Order of Tiers: ``Pass``, ``Anchor``, ``Subclonal``, ``Low Expr``, ``NoExpr``, ``Poor``)(see Tiering criteria `here <https://pvactools.readthedocs.io/en/latest/pvacseq/output_files.html#tiers>`_). Genes that match with the user-input list of genes of interest will have a green box around them (for example, ARID1B and MSH6 are covered by a green box in this demo). This feature can be useful for highlighting neoantigens derived from cancer driver genes.
The second row of the page spans the **Aggregate Report of Best Candidates by Variant** section, which lists all neoantigen candidates in provided input. Candidates with higher Tier will be shown first, followed by candidates of lower Tiers (Order of Tiers: ``Pass``, ``Anchor``, ``Subclonal``, ``Low Expr``, ``NoExpr``, ``Poor``)(see `Tiering criteria <https://pvactools.readthedocs.io/en/latest/pvacseq/output_files.html#tiers>`_). Genes that match with the user-input genes of interest list will have a green box around them (for example, ARID1B and MSH6 are covered by a green box in this demo). This feature can be useful for highlighting neoantigens derived from cancer driver genes.

To view the variant, transcript, and peptide level information of a desired candidate, click on the ``Investigate`` button on the right side of the row for that candidate. The candidate currently under investigation will be framed in blue. The number of the currently investigating row is indicated at the bottom of this section.

Expand All @@ -147,7 +160,7 @@ Example 1: a good candidate: KIF1C-S433F: TEFQIGPEEA

**Variant-level assessment:**

The variant has good DNA and RNA VAF (the DNA VAF is 0.316, higher than the Subclonal threshold of 0.25, thereby the variant is clonal) .
The variant has good DNA and RNA VAF (the DNA VAF is 0.316, higher than the Subclonal threshold of 0.25, thereby pVACseq assumes that the variant is clonal) .

In this case, there’s only 1 mutant transcript matches with the user-provided RNAseq data (``Transcript Sets of Selected Variant`` tab shows only 1 result).

Expand All @@ -157,7 +170,7 @@ In this case, there’s only 1 mutant transcript matches with the user-provided
:alt: pVACview Vignette
:figclass: align-left

The predicted best peptide (neoantigen candidate) doesn’t have any match in the human proteome. This is ideal, since the candidate will more likely to be recognized by T cells due to central tolerance.
The predicted best peptide (neoantigen candidate) doesn’t have any match in the human proteome. This is ideal, since the candidate will more likely to be recognized by T cells and not ignored due to central tolerance.

.. figure:: ../../images/screenshots/vignette/KIF1C-new/KIF1C_2_ReferenceMatches.png
:width: 1000px
Expand Down Expand Up @@ -185,7 +198,7 @@ You can see the mutant (MT) and wildtype (WT) peptide sequence for this transcri

**Peptide-level assessment:**

The candidate in investigation has good binding affinity (median IC50 score is less than 500nM, percentile rank is less than 2%). Elution score varies with algorithms but overall the mutant peptide has better elution score than wildtype peptide, and the elution score is close to 1.
The candidate being investigated has a good binding affinity (median IC50 score is less than 500nM, percentile rank is less than 2%). Elution score varies with algorithms but overall the mutant peptide has better elution score than wildtype peptide, and the elution score is close to 1.

.. figure:: ../../images/screenshots/vignette/KIF1C-new/KIF1C_7_IC50plot.png
:width: 1000px
Expand Down Expand Up @@ -232,7 +245,7 @@ Beside Class-I peptide, the best predicted Class-II peptide from user-input can

**Decision:**

Given all the information above, we can conclude that the reviewed Class I peptide is potentially a good binder and choose to Accept this candidate in the ``Eval`` drop-down menu.
Given all the information above, we may conclude that the reviewed Class I peptide is potentially a good binder and choose to Accept this candidate in the ``Eval`` drop-down menu.

.. figure:: ../../images/screenshots/vignette/KIF1C-new/KIF1C_11_Decision_1.png
:width: 1000px
Expand All @@ -257,11 +270,11 @@ Example 2: a good candidate derived from a variant with multiple transcript sets

**Variant-level assessment:**

The variant has good DNA and RNA VAF (the DNA VAF is 0.302, higher than the Subclonal threshold of 0.25, thereby the variant is clonal) .
The variant has good DNA and RNA VAF (the DNA VAF is 0.302, higher than the Subclonal threshold of 0.25, thereby the variant is assumed to be clonal) .

**Transcript-level assessment:**

Here, there’re 2 transcript sets matching with the user-provided RNAseq data (``Transcript Sets of Selected Variant`` tab shows 2 results). The transcript set highlighted in green (Transcript Set 1 in this case) has the presumably best neoantigen candidate. Transcript Set 1 has 14 transcripts, all of which encode a stretch of amino acids (AERMGFTVVT) which gives rise to 3 different neoantigen candidates: AERMGFTVV, AERMGFTVVT, AERMGFTV. Transcript Set 2 has 1 transcript that encodes a stretch of amino acids (AERMGFTVLP), which gives rise to 3 different neoantigen candidates: AERMGFTVL, AERMGFTVLP, AERMGFTV.
Here, there are 2 transcript sets matching with the user-provided RNAseq data (``Transcript Sets of Selected Variant`` tab shows 2 results). The transcript set highlighted in green (Transcript Set 1 in this case) is suggested as the best neoantigen candidate. Transcript Set 1 has 14 transcripts, all of which encode a stretch of amino acids (AERMGFTVVT) which gives rise to 3 different neoantigen candidates: AERMGFTVV, AERMGFTVVT, AERMGFTV. Transcript Set 2 has 1 transcript that encodes a stretch of amino acids (AERMGFTVLP), which gives rise to 3 different neoantigen candidates: AERMGFTVL, AERMGFTVLP, AERMGFTV.

.. figure:: ../../images/screenshots/vignette/ADAR/TranscriptSet1/ADAR_1_TranscriptSetsOfSelectedVariant_TranscriptSet1.png
:width: 1000px
Expand Down Expand Up @@ -289,7 +302,7 @@ The images below are transcripts in Transcript Set 1 (top-middle, 14 transcripts
:alt: pVACview Vignette
:figclass: align-left

The images below are the neoantigen candidates from Transcript Set 1 (top) and Transcript Set 2 (bottom). The best neoantigen candidate (AERMGFTVV) is highlighted in green. Here, candidates are ranked based on IC50 score - the best candidate has the lowest IC50 score. The Biotype, TSL, existence of problematic positions, and wether or not the peptide failed the anchor evaluation are also taken into account and candidates failing these criteria are deprioritized over candidates passing these criteria. As a result, a candidate with the lowest IC50 score is not always selected as the best peptide if these criteria aren't met.
The images below are the neoantigen candidates from Transcript Set 1 (top) and Transcript Set 2 (bottom). The best neoantigen candidate (AERMGFTVV) is highlighted in green. Here, candidates are ranked based on IC50 score - the best candidate has the lowest IC50 score. The Biotype, TSL, existence of problematic positions, and whether or not the peptide failed the anchor evaluation are also taken into account and candidates failing these criteria are deprioritized over candidates passing these criteria. As a result, a candidate with the lowest IC50 score is not always selected as the best peptide if these criteria aren't met.

.. figure:: ../../images/screenshots/vignette/ADAR/TranscriptSet1/ADAR_3_TranscriptSet1.png
:width: 1000px
Expand All @@ -305,7 +318,7 @@ The images below are the neoantigen candidates from Transcript Set 1 (top) and T

**Peptide-level assessment:**

For simplicity, we will review only the best peptide (AERMGFTVV) of the six candidates mentioned above. This candidate has good binding affinity (the median IC50 is 76.11nM, which is less than the 500nM cut-off; the median %ile is 0.125, which is less than recommended value of 2; the predictions from all algorithms are in high agreement with no outliers, as seen in the violin plot).
For simplicity, we will review only the best peptide (AERMGFTVV) of the six candidates mentioned above. This candidate has good binding affinity (the median IC50 is 76.11nM, which is less than the 500nM cut-off; the median percentile is 0.125, which is less than recommended value of 2; the predictions from all algorithms are in high agreement with no outliers, as seen in the violin plot).

.. figure:: ../../images/screenshots/vignette/ADAR/TranscriptSet1/ADAR_7_IC50plot_TranscriptSet1.png
:width: 1000px
Expand All @@ -319,7 +332,7 @@ For simplicity, we will review only the best peptide (AERMGFTVV) of the six cand
:alt: pVACview Vignette
:figclass: align-left

The candidate also has good elution scores (elution scores close to 1). It's unclear whether the candidate is likely to trigger Tcell response, since immunogenicity %ile scores were not provided (two algorithms BigMHC_IM and DeepImmuno fail to predict immunogenicity %ile scores).
The candidate also has good elution scores (elution scores close to 1). It's unclear whether the candidate is likely to trigger Tcell response, since immunogenicity percentile scores were not provided (two algorithms BigMHC_IM and DeepImmuno do not predict immunogenicity percentile scores).

.. figure:: ../../images/screenshots/vignette/ADAR/TranscriptSet1/ADAR_10_ElutionAndImmunogenicityData_TranscriptSet1.png
:width: 1000px
Expand Down

0 comments on commit 1934b6b

Please sign in to comment.