-
Notifications
You must be signed in to change notification settings - Fork 46
Installation
System Memory Requirements:
- Due to requirements of some of this program's dependencies, it is highly recommended that METABOLIC-C is run on a system containing at least 100 Gb of memory.
- METABOLIC-G is not as demanding as METABOLIC-C and requires significantly less memory to run.
System Storage Requirements:
If you are planning to use only METABOLIC-G, you don't need to install GTDB-tk.
Necessary Databases | Approximate System Storage Required |
---|---|
METABOLIC program with unzipped files | 7.69 Gb (including HMM database) |
GTDB-Tk Reference Data | 28 Gb |
Programs required:
- Perl (>= v5.010)
- HMMER (>= v3.1b2)
- Prodigal (>= v2.6.3)
- Sambamba (>= v0.7.0) (only for METABOLIG-C)
- BAMtools (>= v2.4.0) (only for METABOLIG-C)
- CoverM (only for METABOLIG-C)
- R (>= 3.6.0)
- Diamond
- Samtools (only for METABOLIG-C)
- Bowtie 2 (only for METABOLIG-C)
- GTDB-Tk (only for METABOLIG-C)
- gdown (for downloading METABOLIC_test_files.tgz)
Each of these programs should be in the PATH so that they can be accessed regardless of location.
Perl and R Dependencies Detailed Instructions:
Perl Modules:
To install, use the cpan shell by entering "perl -MCPAN -e shell cpan" and then entering
"install [Module Name]", or install by using "cpan -i [Module Name]", or by entering
"cpanm [Module Name]".
Example 1:
perl -MCPAN -e shell cpan
install Data::Dumper
Example 2:
cpan -i Data::Dumper
Example 3:
cpanm Data::Dumper
1. Data::Dumper
2. POSIX
3. Getopt::Long
4. Statistics::Descriptive
5. Array::Split
6. Bio::SeqIO
7. Bio::Perl
8. Bio::Tools::CodonTable
9. Carp
10. File::Spec
11. File::Basename
12. Parallel::ForkManager
R Packages:
To install, open the R command line interface by entering "R" into the command line, and then enter
"install.packages("[Package Name]")".
Example:
R
install.packages("diagram")
q()
1. diagram (v1.6.4)
2. forcats (v0.5.0)
3. digest (v0.6.25)
4. htmltools (v0.4.0)
5. rmarkdown (v2.1)
6. reprex (v0.3.0)
7. tidyverse (v1.3.0)
8. ggthemes (v4.2.0)
9. ggalluvial (v0.11.3)
10. reshape2 (v1.4.3)
11. ggraph (v2.0.2)
12. pdftools (v2.3)
13. igraph (v1.2.5)
15. tidygraph (v1.1.2)
16. stringr (v1.4.0)
17. plyr (v1.8.6)
18. dplyr (v0.8.5)
19. openxlsx (v4.1.4)
To ensure efficient and successful installation of METABOLIC, make sure that all dependencies are properly installed prior to download of the METABOLIC software.
Go Back to the homepage
- Go to where you want the program to be and clone the github repository by using the following command:
git clone https://github.com/AnantharamanLab/METABOLIC.git
or click the green button "download ZIP" folder at the top of the github and unzip the downloaded file.
The perl and R scripts and dependent databases should be kept in the same directory.
NOTE: Before following the next step, make sure your working directory is the directory that was created by the METABOLIC download, that is, the directory containing the main scripts for METABOLIC (METABOLIC- G.pl, METABOLIC-C.pl, etc.).
NOTE: We created a script for easily setting up dependent databases (step 2-8)
We provide a "run_to_setup.sh" script along with the data downloaded from the GitHub for easy setup of dependent databases. This can be run by using the following command:
bash run_to_setup.sh
Once you've run the bash script, it is not necessary to run Step 2-8 described below.
-
METABOLIC requires the KofamKOALA hmm and METABOLIC hmm databases
KofamKOALA website2.1. Download KofamKOALA hmm database files:
mkdir kofam_database
cd kofam_database
wget -c ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
wget -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
gzip -d ko_list.gz
tar xzf profiles.tar.gz; rm profiles.tar.gz
mv ../All_Module_KO_ids.txt profiles
cd profiles
cp ../../Accessory_scripts/batch_hmmpress.pl ./
perl batch_hmmpress.pl
2.2. The METABOLIC hmm database in "METABOLIC_hmm_db.tgz" contains custom hmm files, self-parsed Pfam and TIRGfam files. It needs to be decompressed to the folder "METABOLIC_hmm_db" and stays in the same directory of KofamKOALA hmm database and scripts.
tar zxvf METABOLIC_hmm_db.tgz
- METABOLIC uses the "METABOLIC_template_and_database" which contains the hmm result table and KEGG database information. Decompress the METABOLIC_template_and_database.tgz to the folder "METABOLIC_template_and_database" and keep it in the same directory of KofamKOALA hmm database and scripts.
tar zxvf METABOLIC_template_and_database.tgz
- This software also contains "Accessory_scripts.gz", which needs to be decompressed before use.
tar zxvf Accessory_scripts.tgz
- This software also contains "Motif.tgz", which needs to be decompressed before use.
tar zxvf Motif.tgz
- You will download the most recent dbCAN-fam-HMMs.txt into a directory (that is made by you) “dbCAN2”. And parse the dbCAN-HMMdb.txt by "batch_hmmpress_for_dbCAN2_HMMdb.pl".
mkdir dbCAN2
cd dbCAN2
wget http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-old@UGA/dbCAN-fam-HMMs.txt
perl ../Accessory_scripts/batch_hmmpress_for_dbCAN2_HMMdb.pl
cd ../
- You will download the MEROPS Peptidase Protein Sequences (https://www.ebi.ac.uk/merops/download_list.shtml, No. 3 option). And parse the pepunit.lib by DIAMOND to make the BLASTP database.
mkdir MEROPS
cd MEROPS
wget ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib
perl ../Accessory_scripts/make_pepunit_db.pl
cd ../
- Finally, this software also contains "METABOLIC_test_files.tgz ", which needs to be decompressed before use. This is a set of test genomes and reads that you can use to test run the program to see if it works correctly before running your real samples.
tar zxvf METABOLIC_test_files.tgz
(Docker version METABOLIC is suggested by the authors when you have trouble in installing the standalone version METABOLIC)
(Contributed by Mr. Tin Ho)
1. Starting the Metabolic container via Docker:
Interactive run (note that content in container are ephemeral. saving kinda work if you use docker commit ...
:
cd METABOLIC; mkdir temp; tar xfz ~/Downloads/5_genomes_test.tgz
docker run -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v "$PWD":/tmp/home --user=$(id -u):$(id -g) tin6150/metabolic
cd /opt/METABOLIC
perl METABOLIC-G.pl -help
perl /opt/METABOLIC/METABOLIC-G.pl -in-gn /tmp/home/5_genomes_test/Genome_files -o /tmp/home/metabolic_out
Non interactive, scriptable run:
docker pull tin6150/metabolic
docker run -v "$(pwd)":/tmp/home --entrypoint "perl" tin6150/metabolic /opt/METABOLIC/METABOLIC-G.pl -t 34 -in-gn /tmp/home/5_genomes_test/Genome_files -o /tmp/home/metabolic_out
# Output will be in ./metabolic_out
2. Debug runs/tests:
docker run -it -v $HOME:/home/tin tin6150/metabolic
docker exec -it pluto_amp bash # additional terminal into existing running container
# testing intermediary container use:
docker run -it -v $HOME:/home/tin tin6150/base4metabolic
docker run -it -v $HOME:/home/tin tin6150/perl4metabolic
# checking PERL5LIB @INC
env -i perl -V # ignores the PERL5LIB env var
env perl -V
# both should return the same output,
# but if root's env got inherited, clear it with something like export PERL5LIB=''
3. Database for GTDB-Tk
GTDB-Tk is needed when running METABOLIC-C. You will need to first set up the GTDB-Tk database as:
GTDBTK_DATA_PATH=/tmp/GTDBTK_DATA
cd $GTDBTK_DATA_PATH
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/gtdbtk_r89_data.tar.gz
tar xzf gtdbtk_r89_data.tar.gz
# See https://github.com/Ecogenomics/GTDBTk for links to newer db
docker run -v /tmp:/tmp --entrypoint "perl" tin6150/metabolic /opt/METABOLIC/METABOLIC-G.pl -t 34 -in-gn /tmp//5_genomes_test/Genome_files -o /tmp/metabolic_out
More information on Instructions of installing and building Docker version METABOLIC can be found here: https://github.com/AnantharamanLab/METABOLIC/tree/master/container
(Contributed by Dr. Daan Speth)
The coment below highlights the steps that worked to get an anaconda environment set up with all the dependencies required to run METABOLIC in both C and G mode. Further down this thread I've posted a yaml file with the specifications for that environment. Note that this does not install METABOLIC itself, this still needs to be cloned from this repository, and the resulting METABOLIC
directory needs to be put in $PATH
for the scripts to run.
One additional step is required: the shebang line of the two main scripts (METABOLIC-C.pl
and METABOLIC-G.pl
) should be edited to match the perl installation in your conda environment (ie: #! /path/to/conda/env/bin/perl
)
############## Orginal comment:
I spent a bunch of time today getting METABOLIC installed in a conda environment, and I found that the order in which I installed the dependencies mattered for the success of the install. Finally got it working with the list of commands below.
the only actual issue is a dependency missing in the documentation: Parallel::Forkmanager edit: ah, and ggraph is listed as a dependency twice
Otherwise, I hope this is helpful for people trying to get the software installed
conda channels I have added (r channel is not needed for the install):
https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/bioconda/linux-64
https://conda.anaconda.org/bioconda/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
create env
conda create -n metabolic
conda activate metabolic
conda install the required tools
conda install sambamba
conda install bamtools
conda install coverm # installs perl 5.32
conda install gtdbtk
conda install diamond
conda install bowtie2
conda install R=3.6.0
conda install R dependencies
conda install r-tidyverse=1.3.0
conda install r-diagram
conda install r-ggthemes
conda install r-ggalluvial
conda install r-ggraph
conda install r-openxlsx
conda install r-pdftools
conda install perl dependencies
conda install perl-data-dumper # downgrades perl to 5.26.2
conda install perl-excel-writer-xlsx
conda install perl-posix
conda install perl-getopt-long
conda install perl-statistics-descriptive
conda install perl-bioperl
get the one pesky perl dependency not available through conda
conda install perl-app-cpanminus
env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split
conda install gdown
conda install gdown
conda install the perl package to solve the first (and so far only) error
conda install perl-parallel-forkmanager
(The original information could be found in issue 27)
Notice: According to Michal Strejcek, conda has issues with Perl (Quoted here "Unfortunately, conda has issues with Perl. For example, installation of array::split previously exited with some compilation error. "); and if you are using R 4.x (we suggest to use R 3.x), some small changes need to be done in the script. Here is the link to his solutions to these issues: https://github.com/AnantharamanLab/METABOLIC/issues/41.
Run by Snakemake (additional)
(Contributed by Dr. Susheel Bhanu Busi in issue 27)
The yaml file for the environment setting up: https://github.com/AnantharamanLab/METABOLIC/files/5962517/metabolic_environment.txt (extension needs to be changed from ".txt" to ".yml ")
METABOLIC can be run in a snakemake workflow as follows (as an example):
rule metabolic:
input:
fa="data/GL_R9_GL11_UP_2_O4.2.1.contigs.fasta",
reads="data/metabolic_reads.txt"
output:
directory("metabolic_output")
log:
out="logs/metabolic.out.log",
err="logs/metabolic.err.log"
conda:
os.path.join(ENV_DIR, "metabolic.yaml") # the same as metabolic_environment.yml
params:
gtdbtk=config["metabolic"]["db"],
metabolic=config["metabolic"]["directory"]
threads:
config["metabolic"]["threads"]
message:
"Running metabolic for all ROCK bins"
shell:
"(date && "
"export GTDBTK_DATA_PATH={params.gtdbtk} && "
"""env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split && """
"perl {params.metabolic}/METABOLIC-C.pl -t {threads} -in-gn $(dirname {input.fa}) -r {input.reads} -o {output} && "
"date) 2> {log.err} > {log.out}"
Notice: The Array::Split installation happens after the conda environment is built. Haven't found a way around this yet, but it seems to be working.