Olfactory_receptor_genes

First create a folder containg your genome file (fasta format) This folder should also contain the following files :

rename_fasta.pl
R_script_numero1.R
R_script_numero2.R
Classification_outgroups.prot
Classification_OR_multifasta.prot
The Interproscan launcher (interproscan.sh)
clearer_ambigous_nt.py
All_fishes_cdhit_80.fa

Several dependencies are needed :

EMBOSS 6.6.0.0
IQ-TREE 1.6.12
samtools v1.9
BLAST 2.6.0+
MAFFT v7.310
CD-HIT 4.8.1
R with the packages "data.table", "dplyr", "plyranges", "GenomicRanges"

How to run the pipeline ?

First step consist in finding functional OR genes from the genome. Start by launching :

./OR_Finder_Step1.bash genome.fasta All_fishes_cdhit_80.fa path_to_unitprot_database

The results will consist in phylogenetic tree containing all the putative ORs found in the genome. You can visualise this tree on iTOL, root using non-OR outgroup sequences and select sequences that are well clustered with OR genes. The label of good ORs (sequence name without the fasta header >) should be put in a text file, with one sequence per line. Example :

File_good_seqs.txt :

Mysequence1

Mysequence2

Mysequence3

The second part of the pipeline concist in finding OR pseudogenes and incomplete genes :

./OR_Finder_Step2.bash genome.fasta File_good_seqs.txt

This will result in several output files :

RESULTS_Pseudogenes.fa containing pseudogenes
Functionnals_Edges_Truncated_cdhit.fa containing functional and incomplete genes
Classification_fasta.prot.aln.treefile that will help you to classify OR genes in families

One can also classify pseudogenes by performing a blastx against known fish ORs and assign pseudogenes based on their best blastx match

To get results faster, one can compute near-ML trees with FastTree instead of ML trees produced with IQ-TREE but results could change a bit.

Do not hesitate to sent me an e-mail ([email protected]) to suggest modifications or to report any problem !

PS : The manual verification can be omitted using the R package "ape" to exlucde sequences that are not grouped with OR genes in the phylogeny. This is currently being integrated in the pipeline with some other modifications. This new pipeline will be released soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Olfactory_receptor_genes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
All_fishes_cdhit_80.fa		All_fishes_cdhit_80.fa
Classification_OR_multifasta.prot		Classification_OR_multifasta.prot
Classification_outgroups.prot		Classification_outgroups.prot
OR_Finder_Step1.bash		OR_Finder_Step1.bash
OR_Finder_Step2.bash		OR_Finder_Step2.bash
README.md		README.md
R_script_numero1.R		R_script_numero1.R
R_script_numero2.R		R_script_numero2.R
clearer_ambigous_nt.py		clearer_ambigous_nt.py
rename_fasta.pl		rename_fasta.pl

MaximePolicarpo/Olfactory_receptor_genes

Folders and files

Latest commit

History

Repository files navigation

Olfactory_receptor_genes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages