Skip to content

Installation

Zhichao Zhou edited this page Jul 21, 2021 · 37 revisions

System requirement:

System Memory Requirements:

  • Due to requirements of some of this program's dependencies, it is highly recommended that METABOLIC-C is run on a system containing at least 100 Gb of memory.
  • METABOLIC-G is not as demanding as METABOLIC-C and requires significantly less memory to run.

System Storage Requirements:

If you are planning to use only METABOLIC-G, you don't need to install GTDB-tk.

Necessary Databases Approximate System Storage Required
METABOLIC program with unzipped files 7.69 Gb (including HMM database)
GTDB-Tk Reference Data 28 Gb

Go Back to the homepage

Dependencies overview:

Programs required:

  1. Perl (>= v5.010)
  2. HMMER (>= v3.1b2)
  3. Prodigal (>= v2.6.3)
  4. Sambamba (>= v0.7.0) (only for METABOLIG-C)
  5. BAMtools (>= v2.4.0) (only for METABOLIG-C)
  6. CoverM (only for METABOLIG-C)
  7. R (>= 3.6.0)
  8. Diamond
  9. Samtools (only for METABOLIG-C)
  10. Bowtie 2 (only for METABOLIG-C)
  11. GTDB-Tk (only for METABOLIG-C)
  12. gdown (for downloading METABOLIC_test_files.tgz)

Each of these programs should be in the PATH so that they can be accessed regardless of location.
Perl and R Dependencies Detailed Instructions:
Perl Modules:
  To install, use the cpan shell by entering "perl -MCPAN -e shell cpan" and then entering
  "install [Module Name]", or install by using "cpan -i [Module Name]", or by entering
  "cpanm [Module Name]".

Example 1:
perl -MCPAN -e shell cpan
install Data::Dumper

Example 2:
cpan -i Data::Dumper

Example 3:
cpanm Data::Dumper

    1. Data::Dumper
    2. POSIX
    3. Getopt::Long
    4. Statistics::Descriptive
    5. Array::Split
    6. Bio::SeqIO
    7. Bio::Perl
    8. Bio::Tools::CodonTable
    9. Carp
    10. File::Spec
    11. File::Basename
    12. Parallel::ForkManager

R Packages:
  To install, open the R command line interface by entering "R" into the command line, and then enter
  "install.packages("[Package Name]")".

Example:
R
install.packages("diagram")
q()

    1. diagram (v1.6.4)
    2. forcats (v0.5.0)
    3. digest (v0.6.25)
    4. htmltools (v0.4.0)
    5. rmarkdown (v2.1)
    6. reprex (v0.3.0)
    7. tidyverse (v1.3.0)
    8. ggthemes (v4.2.0)
    9. ggalluvial (v0.11.3)
    10. reshape2 (v1.4.3)
    11. ggraph (v2.0.2)
    12. pdftools (v2.3)
    13. igraph (v1.2.5)
    15. tidygraph (v1.1.2)
    16. stringr (v1.4.0)
    17. plyr (v1.8.6)
    18. dplyr (v0.8.5)
    19. openxlsx (v4.1.4)

To ensure efficient and successful installation of METABOLIC, make sure that all dependencies are properly installed prior to download of the METABOLIC software.

Go Back to the homepage

Installation instructions:

  1. Go to where you want the program to be and clone the github repository by using the following command:
git clone https://github.com/AnantharamanLab/METABOLIC.git

  or click the green button "download ZIP" folder at the top of the github and unzip the downloaded file.
  The perl and R scripts and dependent databases should be kept in the same directory.

NOTE: Before following the next step, make sure your working directory is the directory that was created by the METABOLIC download, that is, the directory containing the main scripts for METABOLIC (METABOLIC- G.pl, METABOLIC-C.pl, etc.).

NOTE: We created a script for easily setting up dependent databases (step 2-8)

We provide a "run_to_setup.sh" script along with the data downloaded from the GitHub for easy setup of dependent databases. This can be run by using the following command:

bash run_to_setup.sh

Once you've run the bash script, it is not necessary to run Step 2-8 described below.

  1. METABOLIC requires the KofamKOALA hmm and METABOLIC hmm databases
    KofamKOALA website

    2.1. Download KofamKOALA hmm database files:

    mkdir kofam_database  
    cd kofam_database  
    wget -c ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz  
    wget -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz  
    gzip -d ko_list.gz  
    tar xzf profiles.tar.gz; rm profiles.tar.gz  
    mv ../All_Module_KO_ids.txt profiles  
    cd profiles  
    cp ../../Accessory_scripts/batch_hmmpress.pl ./  
    perl batch_hmmpress.pl  

 2.2. The METABOLIC hmm database in "METABOLIC_hmm_db.tgz" contains custom hmm files, self-parsed Pfam and TIRGfam files. It needs to be decompressed to the folder "METABOLIC_hmm_db" and stays in the same directory of KofamKOALA hmm database and scripts.

  tar zxvf METABOLIC_hmm_db.tgz
  1. METABOLIC uses the "METABOLIC_template_and_database" which contains the hmm result table and KEGG database information. Decompress the METABOLIC_template_and_database.tgz to the folder "METABOLIC_template_and_database" and keep it in the same directory of KofamKOALA hmm database and scripts.
  tar zxvf METABOLIC_template_and_database.tgz
  1. This software also contains "Accessory_scripts.gz", which needs to be decompressed before use.
  tar zxvf Accessory_scripts.tgz
  1. This software also contains "Motif.tgz", which needs to be decompressed before use.
  tar zxvf Motif.tgz
  1. You will download the most recent dbCAN-fam-HMMs.txt into a directory (that is made by you) “dbCAN2”. And parse the dbCAN-HMMdb.txt by "batch_hmmpress_for_dbCAN2_HMMdb.pl".
  mkdir dbCAN2
  cd dbCAN2
  wget http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-old@UGA/dbCAN-fam-HMMs.txt
  perl ../Accessory_scripts/batch_hmmpress_for_dbCAN2_HMMdb.pl
  cd ../
  1. You will download the MEROPS Peptidase Protein Sequences (https://www.ebi.ac.uk/merops/download_list.shtml, No. 3 option). And parse the pepunit.lib by DIAMOND to make the BLASTP database.
  mkdir MEROPS
  cd MEROPS
  wget ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib
  perl ../Accessory_scripts/make_pepunit_db.pl
  cd ../
  1. Finally, this software also contains "METABOLIC_test_files.tgz ", which needs to be decompressed before use. This is a set of test genomes and reads that you can use to test run the program to see if it works correctly before running your real samples.
  tar zxvf METABOLIC_test_files.tgz

Go Back to the homepage

Quick installation:

Docker

(Docker version METABOLIC is suggested by the authors when you have trouble in installing the standalone version METABOLIC)

(Contributed by Mr. Tin Ho)

1. Starting the Metabolic container via Docker:

Interactive run (note that content in container are ephemeral. saving kinda work if you use docker commit ...:

cd METABOLIC; mkdir temp; tar xfz ~/Downloads/5_genomes_test.tgz
docker run -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v "$PWD":/tmp/home --user=$(id -u):$(id -g)  tin6150/metabolic
cd /opt/METABOLIC
perl METABOLIC-G.pl -help
perl /opt/METABOLIC/METABOLIC-G.pl -in-gn /tmp/home/5_genomes_test/Genome_files -o /tmp/home/metabolic_out

Non interactive, scriptable run:

docker pull tin6150/metabolic
docker run  -v "$(pwd)":/tmp/home --entrypoint "perl" tin6150/metabolic /opt/METABOLIC/METABOLIC-G.pl -t 34 -in-gn /tmp/home/5_genomes_test/Genome_files -o /tmp/home/metabolic_out
# Output will be in ./metabolic_out

2. Debug runs/tests:

docker run  -it -v $HOME:/home/tin tin6150/metabolic
docker exec -it pluto_amp bash                 # additional terminal into existing running container

# testing intermediary container use:
docker run  -it -v $HOME:/home/tin tin6150/base4metabolic
docker run  -it -v $HOME:/home/tin tin6150/perl4metabolic

# checking PERL5LIB @INC
env -i perl -V    # ignores the PERL5LIB env var
env    perl -V
# both should return the same output,
# but if root's env got inherited, clear it with something like export PERL5LIB=''

3. Database for GTDB-Tk

GTDB-Tk is needed when running METABOLIC-C. You will need to first set up the GTDB-Tk database as:

GTDBTK_DATA_PATH=/tmp/GTDBTK_DATA
cd $GTDBTK_DATA_PATH
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/gtdbtk_r89_data.tar.gz
tar xzf gtdbtk_r89_data.tar.gz
# See https://github.com/Ecogenomics/GTDBTk for links to newer db

docker run  -v /tmp:/tmp --entrypoint "perl" tin6150/metabolic /opt/METABOLIC/METABOLIC-G.pl -t 34 -in-gn /tmp//5_genomes_test/Genome_files -o /tmp/metabolic_out

More information on Instructions of installing and building Docker version METABOLIC can be found here: https://github.com/AnantharamanLab/METABOLIC/tree/master/container

Anaconda environment

(Contributed by Dr. Daan Speth)

The coment below highlights the steps that worked to get an anaconda environment set up with all the dependencies required to run METABOLIC in both C and G mode. Further down this thread I've posted a yaml file with the specifications for that environment. Note that this does not install METABOLIC itself, this still needs to be cloned from this repository, and the resulting METABOLIC directory needs to be put in $PATH for the scripts to run.

One additional step is required: the shebang line of the two main scripts (METABOLIC-C.pl and METABOLIC-G.pl) should be edited to match the perl installation in your conda environment (ie: #! /path/to/conda/env/bin/perl)

############## Orginal comment:

I spent a bunch of time today getting METABOLIC installed in a conda environment, and I found that the order in which I installed the dependencies mattered for the success of the install. Finally got it working with the list of commands below.

the only actual issue is a dependency missing in the documentation: Parallel::Forkmanager edit: ah, and ggraph is listed as a dependency twice

Otherwise, I hope this is helpful for people trying to get the software installed

conda channels I have added (r channel is not needed for the install):

https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/bioconda/linux-64
https://conda.anaconda.org/bioconda/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch

create env

conda create -n metabolic
conda activate metabolic

conda install the required tools

conda install sambamba
conda install bamtools
conda install coverm # installs perl 5.32
conda install gtdbtk
conda install diamond
conda install bowtie2
conda install R=3.6.0

conda install R dependencies

conda install r-tidyverse=1.3.0
conda install r-diagram
conda install r-ggthemes
conda install r-ggalluvial
conda install r-ggraph
conda install r-openxlsx
conda install r-pdftools

conda install perl dependencies

conda install perl-data-dumper # downgrades perl to 5.26.2
conda install perl-excel-writer-xlsx
conda install perl-posix
conda install perl-getopt-long
conda install perl-statistics-descriptive
conda install perl-bioperl

get the one pesky perl dependency not available through conda

conda install perl-app-cpanminus
env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split

conda install gdown

conda install gdown

conda install the perl package to solve the first (and so far only) error

conda install perl-parallel-forkmanager

(The original information could be found in issue 27)

Notice: According to Michal Strejcek, conda has issues with Perl (Quoted here "Unfortunately, conda has issues with Perl. For example, installation of array::split previously exited with some compilation error. "); and if you are using R 4.x (we suggest to use R 3.x), some small changes need to be done in the script. Here is the link to his solutions to these issues: https://github.com/AnantharamanLab/METABOLIC/issues/41.

Run by Snakemake (additional)
(Contributed by Dr. Susheel Bhanu Busi in issue 27)
The yaml file for the environment setting up: https://github.com/AnantharamanLab/METABOLIC/files/5962517/metabolic_environment.txt (extension needs to be changed from ".txt" to ".yml ")

METABOLIC can be run in a snakemake workflow as follows (as an example):

rule metabolic:
    input:
        fa="data/GL_R9_GL11_UP_2_O4.2.1.contigs.fasta",
        reads="data/metabolic_reads.txt"
    output:
        directory("metabolic_output")
    log:
        out="logs/metabolic.out.log",
        err="logs/metabolic.err.log"
    conda:
        os.path.join(ENV_DIR, "metabolic.yaml") # the same as metabolic_environment.yml
    params:
        gtdbtk=config["metabolic"]["db"],
        metabolic=config["metabolic"]["directory"]
    threads:
        config["metabolic"]["threads"]
    message:
        "Running metabolic for all ROCK bins"
    shell:
        "(date && "
        "export GTDBTK_DATA_PATH={params.gtdbtk} && "
        """env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split && """
        "perl {params.metabolic}/METABOLIC-C.pl -t {threads} -in-gn $(dirname {input.fa}) -r {input.reads} -o {output} && "
        "date) 2> {log.err} > {log.out}"

Notice: The Array::Split installation happens after the conda environment is built. Haven't found a way around this yet, but it seems to be working.


Go Back to the homepage

Clone this wiki locally