Skip to content

Latest commit

 

History

History
42 lines (24 loc) · 1.94 KB

README.md

File metadata and controls

42 lines (24 loc) · 1.94 KB

Snakemake

How to run the pipeline

1.) Run the script "pipeline_setting.pl".

If you just type ./pipeline_setting.pl you will get the following information:

 This script must be run before the pipeline (snakemake). The arguments are as followed

 --ref <file>		 The reference protein FASTA file [COMPULSORY]
 --trans <file1> <file2>	 One or more transcriptome(s) nucleotide fasta file(s) [COMPULSORY]
 --project <name>		 A name for your project [COMPULSORY]
 --gff <file>		 The GFF3 file of your reference genome (from Phytozome) [COMPULSORY]
 --script <directory>	 The directory that contains all the secondary scripts [COMPULSORY]
 --refblast <file>	 A "ref versus ref" blastp tabular output

Ex. : ./pipeline_setting.pl --project My_Project --ref my_reference.protein.fasta --trans my_new_transcriptome.fasta my_other_new_transcriptome.fasta --gff my_ref_proteome.gff --script my_script_folder

2.) Go into the newly created project directory.

3.) Type snakemake

What does the pipeline do?

  1. Do the "ref proteome versus ref proteome" blastp search (except if it has been done before) -> tabular blastp output file

  2. Run the "get_low_copy.py" script on this blastp output file -> list of protein names.

  3. Run the "parsGff3.py" script on the reference GFF3 file -> list of protein names.

  4. Run the "extract_fasta_from_list.pl" script. It uses the 2 lists of protein names -> fasta file of selected protein sequences

  5. If two "de novo" transcriptomes are used, a blastn search of transcriptome 1 against transcriptome 2 will be done and produce a list of sequences from transcriptome 1 that have good blast hit to a sequence in transcriptome 2.

  6. Blastx search of transcriptome/selected nucleotide sequences against selected reference protein sequences -> blastx tabular output file.

[7- Run parseBLASTtable.py]