Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package VEP #13

Open
Oodnadatta opened this issue Feb 17, 2018 · 5 comments
Open

Package VEP #13

Oodnadatta opened this issue Feb 17, 2018 · 5 comments

Comments

@Oodnadatta
Copy link
Member

Oodnadatta commented Feb 17, 2018

Good luck!

Requirements

apt install \
    libdbi-perl \
    libdbd-mysql-perl \
    curl \
    zip \
    build-essential \
    zlib1g-dev \
    libmodule-build-perl \
    git
@Oodnadatta
Copy link
Member Author

The VEP can either connect to remote or local databases, or use local cache files.
Using local cache files is the fastest and most efficient way to run the VEP
Cache files will be stored in /home/asdp/.vep
Do you want to install any cache files (y/n)? n
Skipping cache installation

The VEP can use FASTA files to retrieve sequence data for HGVS notations and reference sequence checks.
FASTA files will be stored in /home/asdp/.vep
Do you want to install any FASTA files (y/n)? n
Skipping FASTA installation - Exiting

The VEP can use plugins to add functionality and data.
Plugins will be installed in /home/asdp/.vep/Plugins
Do you want to install any plugins (y/n)? y
Cache directory /home/asdp/.vep does not exists - do you want to create it (y/n)? y
ERROR: Could not create directory /home/asdp/.vep

mkdir -p /home/asdp/.vep

@Oodnadatta
Copy link
Member Author

Pathogenicity predictions

1: dbNSFP - dbNSFP provides pathogenicity predictions for missense variants from various algorithms
2: CADD - Combined Annotation Dependent Depletion (CADD) is a tool for scoring the deleteriousness of single nucleotide variants and insertion/deletion variants in the human genome. CADD integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations.
3: FATHMM_MKL - FATHMM-MKL predicts functional consequences of variants, both coding and non-coding.
4: Gwava - Retrieves precomputed Genome Wide Annotation of VAriants (GWAVA) scores for any variant that overlaps a known variant from the Ensembl variation database
5: Carol - Calculates the Combined Annotation scoRing toOL (CAROL) score for a missense mutation based on the pre-calculated SIFT and PolyPhen scores
6: Condel - Calculates the Consensus Deleteriousness (Condel) score for a missense mutation based on the pre-calculated SIFT and PolyPhen scores
7: PolyPhen_SIFT - Retrieves PolyPhen and SIFT predictions from a locally constructed sqlite database
8: LoF - LOFTEE identifies LoF (loss-of-function) variation
9: LoFtool - Provides a per-gene rank of genic intolerance and consequent susceptibility to disease based on the ratio of Loss-of-function (LoF) to synonymous mutations in ExAC data
10: ExACpLI - Provides a per-gene probability of being loss-of-function intolerant (pLI) from ExAC data
11: MPC - MPC is a missense deleteriousness metric based on the analysis of genic regions depleted of missense mutations in ExAC
12: MTR - MTR scores quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence

Splicing predictions

13: dbscSNV - Retrieves data for splicing variants from a tabix-indexed dbscSNV file
14: GeneSplicer - Detects splice sites in genomic DNA
15: MaxEntScan - Sequence motif and maximum entropy based splice site consensus predictions
16: SpliceRegion - More granular predictions of splicing effects

Conservation

17: Blosum62 - BLOSUM62 amino acid conservation score
18: Conservation - Retrieves a conservation score from the Ensembl Compara databases for variant positions
19: AncestralAllele - Retrieves the ancestral allele for variants inferred from the Ensembl Compara Enredo-Pecan-Ortheus (EPO) pipeline

Identifiers

20: CSN - Reports Clinical Sequencing Nomenclature (CSN) for variants

Frequency data

21: ExAC - Reports allele frequencies from the Exome Aggregation Consortium

Variant data

22: LD - Finds variants in linkage disequilibrium with any overlapping existing variants from the Ensembl variation databases
23: SameCodon - Reports existing variants that fall in the same codon

Gene data

24: GO - Retrieves Gene Ontology terms associated with transcripts/translations via the Ensembl API
25: GXA - Reports data from the Gene Expression Atlas

Other plugins

26: miRNA - Determines where in the secondary structure of a miRNA a variant falls
27: UpDownDistance - Change the distance to transcript (default is 5000bp) for which VEP assigns upstream and downstream consequences
28: NearestGene - Finds the nearest gene to non-genic variants
29: Downstream - Predicts the downstream effects of a frameshift variant on the protein sequence of a transcript
30: ProteinSeqs - Prints out the reference and mutated protein sequences of any proteins found with non-synonymous mutations
31: TSSDistance - Calculates the distance from the transcription start site for upstream variants

@ikit
Copy link
Member

ikit commented Mar 16, 2018

I thinks that VEP should not be installed directly on the server.
All pipelines must be put in container to keep the server clean, and also to avoid conflicts with all other pipelines version/dependancies.

@Oodnadatta
Copy link
Member Author

In progress!

@Oodnadatta
Copy link
Member Author

ftp://ftp.ensembl.org/pub/release-91/variation/VEP/homo_sapiens_merged_vep_91_GRCh37.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants