-
Notifications
You must be signed in to change notification settings - Fork 46
Installation
System Memory Requirements:
- Due to requirements of some of this program's dependencies, it is highly recommended that METABOLIC-C is run on a system containing at least 100 Gb of memory.
- METABOLIC-G is not as demanding as METABOLIC-C and requires significantly less memory to run.
System Storage Requirements:
If you are planning to use only METABOLIC-G, you don't need to install GTDB-tk.
Necessary Databases | Approximate System Storage Required |
---|---|
METABOLIC program with unzipped files | 7.69 Gb (including HMM database) |
GTDB-Tk Reference Data | 28 Gb |
Programs required:
- Perl (>= v5.010)
- HMMER (>= v3.1b2)
- Prodigal (>= v2.6.3)
- Sambamba (>= v0.7.0) (only for METABOLIG-C)
- BAMtools (>= v2.4.0) (only for METABOLIG-C)
- CoverM (only for METABOLIG-C)
- R (>= 3.6.0)
- Diamond
- Samtools (only for METABOLIG-C)
- Bowtie 2 (only for METABOLIG-C)
- GTDB-Tk (only for METABOLIG-C)
- gdown (for downloading METABOLIC_test_files.tgz)
Each of these programs should be in the PATH so that they can be accessed regardless of location.
Perl and R Dependencies Detailed Instructions:
Perl Modules:
To install, use the cpan shell by entering "perl -MCPAN -e shell cpan" and then entering
"install [Module Name]", or install by using "cpan -i [Module Name]", or by entering
"cpanm [Module Name]".
Example 1:
perl -MCPAN -e shell cpan
install Data::Dumper
Example 2:
cpan -i Data::Dumper
Example 3:
cpanm Data::Dumper
1. Data::Dumper
2. POSIX
3. Getopt::Long
4. Statistics::Descriptive
5. Array::Split
6. Bio::SeqIO
7. Bio::Perl
8. Bio::Tools::CodonTable
9. Carp
10. File::Spec
11. File::Basename
12. Parallel::ForkManager
R Packages:
To install, open the R command line interface by entering "R" into the command line, and then enter
"install.packages("[Package Name]")".
Example:
R
install.packages("diagram")
q()
1. diagram (v1.6.4)
2. forcats (v0.5.0)
3. digest (v0.6.25)
4. htmltools (v0.4.0)
5. rmarkdown (v2.1)
6. reprex (v0.3.0)
7. tidyverse (v1.3.0)
8. ggthemes (v4.2.0)
9. ggalluvial (v0.11.3)
10. reshape2 (v1.4.3)
11. ggraph (v2.0.2)
12. pdftools (v2.3)
13. igraph (v1.2.5)
15. tidygraph (v1.1.2)
16. stringr (v1.4.0)
17. plyr (v1.8.6)
18. dplyr (v0.8.5)
19. openxlsx (v4.1.4)
To ensure efficient and successful installation of METABOLIC, make sure that all dependencies are properly installed prior to download of the METABOLIC software.
- Go to where you want the program to be and clone the github repository by using the following command:
git clone https://github.com/AnantharamanLab/METABOLIC.git
or click the green button "download ZIP" folder at the top of the github and unzip the downloaded file.
The perl and R scripts and dependent databases should be kept in the same directory.
NOTE: Before following the next step, make sure your working directory is the directory that was created by the METABOLIC download, that is, the directory containing the main scripts for METABOLIC (METABOLIC- G.pl, METABOLIC-C.pl, etc.).
NOTE: We created a script for easily setting up dependent databases (step 2-8)
We provide a "run_to_setup.sh" script along with the data downloaded from the GitHub for easy setup of dependent databases. This can be run by using the following command:
bash run_to_setup.sh
Once you've run the bash script, it is not necessary to run Step 2-8 described below.
-
METABOLIC requires the KofamKOALA hmm and METABOLIC hmm databases
KofamKOALA website2.1. Download KofamKOALA hmm database files:
mkdir kofam_database
cd kofam_database
wget -c ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
wget -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
gzip -d ko_list.gz
tar xzf profiles.tar.gz; rm profiles.tar.gz
mv ../All_Module_KO_ids.txt profiles
cd profiles
cp ../../Accessory_scripts/batch_hmmpress.pl ./
perl batch_hmmpress.pl
2.2. The METABOLIC hmm database in "METABOLIC_hmm_db.tgz" contains custom hmm files, self-parsed Pfam and TIRGfam files. It needs to be decompressed to the folder "METABOLIC_hmm_db" and stays in the same directory of KofamKOALA hmm database and scripts.
tar zxvf METABOLIC_hmm_db.tgz
- METABOLIC uses the "METABOLIC_template_and_database" which contains the hmm result table and KEGG database information. Decompress the METABOLIC_template_and_database.tgz to the folder "METABOLIC_template_and_database" and keep it in the same directory of KofamKOALA hmm database and scripts.
tar zxvf METABOLIC_template_and_database.tgz
- This software also contains "Accessory_scripts.gz", which needs to be decompressed before use.
tar zxvf Accessory_scripts.tgz
- This software also contains "Motif.tgz", which needs to be decompressed before use.
tar zxvf Motif.tgz
- You will download the most recent dbCAN-fam-HMMs.txt into a directory (that is made by you) “dbCAN2”. And parse the dbCAN-HMMdb.txt by "batch_hmmpress_for_dbCAN2_HMMdb.pl".
mkdir dbCAN2
cd dbCAN2
wget http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-old@UGA/dbCAN-fam-HMMs.txt
perl ../Accessory_scripts/batch_hmmpress_for_dbCAN2_HMMdb.pl
cd ../
- You will download the MEROPS Peptidase Protein Sequences (https://www.ebi.ac.uk/merops/download_list.shtml, No. 3 option). And parse the pepunit.lib by DIAMOND to make the BLASTP database.
mkdir MEROPS
cd MEROPS
wget ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib
perl ../Accessory_scripts/make_pepunit_db.pl
cd ../
- Finally, this software also contains "METABOLIC_test_files.tgz ", which needs to be decompressed before use. This is a set of test genomes and reads that you can use to test run the program to see if it works correctly before running your real samples.
tar zxvf METABOLIC_test_files.tgz