Skip to content

Installation

Patricia Tran edited this page Jan 18, 2024 · 37 revisions

System requirement:

System Memory Requirements:

  • Due to requirements of some of this program's dependencies, it is highly recommended that METABOLIC-C is run on a system containing at least 100 Gb of memory.
  • METABOLIC-G is not as demanding as METABOLIC-C and requires significantly less memory to run.

System Storage Requirements:

If you are planning to use only METABOLIC-G, you don't need to install GTDB-tk.

Necessary Databases Approximate System Storage Required
METABOLIC program with unzipped files 7.69 Gb (including HMM database)
GTDB-Tk Reference Data 28 Gb

Go Back to the homepage

Dependencies overview:

Programs required:

  1. Perl (>= v5.010)
  2. HMMER (>= v3.1b2)
  3. Prodigal (>= v2.6.3)
  4. Sambamba (>= v0.7.0) (only for METABOLIG-C)
  5. BAMtools (>= v2.4.0) (only for METABOLIG-C)
  6. CoverM (only for METABOLIG-C)
  7. R (>= 3.6.0, < 4.x) # (some changes have been made in R 4.x, we suggest to use R 3.x)
  8. Diamond
  9. Samtools (only for METABOLIG-C)
  10. Bowtie 2 (only for METABOLIG-C)
  11. GTDB-Tk (only for METABOLIG-C)
  12. minimap2 (>= v2.17) (only for METABOLIG-C long reads mapping)
  13. gdown (for downloading METABOLIC_test_files.tgz)

Each of these programs should be in the PATH so that they can be accessed regardless of location.


Perl and R Dependencies Detailed Instructions:
Perl Modules:
  To install, use the cpan shell by entering "perl -MCPAN -e shell cpan" and then entering
  "install [Module Name]", or install by using "cpan -i [Module Name]", or by entering
  "cpanm [Module Name]".

Example 1:
perl -MCPAN -e shell cpan
install Data::Dumper

Example 2:
cpan -i Data::Dumper

Example 3:
cpanm Data::Dumper

    1. Data::Dumper
    2. POSIX
    3. Getopt::Long
    4. Statistics::Descriptive
    5. Bio::SeqIO
    6. Bio::Perl
    7. Bio::Tools::CodonTable
    8. Carp
    9. File::Spec
    10. File::Basename
    11. Parallel::ForkManager

R Packages:
  To install, open the R command line interface by entering "R" into the command line, and then enter
  "install.packages("[Package Name]")".

Example:
R
install.packages("diagram")
q()

    1. diagram (v1.6.4)
    2. forcats (v0.5.0)
    3. digest (v0.6.25)
    4. htmltools (v0.4.0)
    5. rmarkdown (v2.1)
    6. reprex (v0.3.0)
    7. tidyverse (v1.3.0)
    8. ggthemes (v4.2.0)
    9. ggalluvial (v0.11.3)
    10. reshape2 (v1.4.3)
    11. ggraph (v2.0.2)
    12. pdftools (v2.3)
    13. igraph (v1.2.5)
    15. tidygraph (v1.1.2)
    16. stringr (v1.4.0)
    17. plyr (v1.8.6)
    18. dplyr (v0.8.5)
    19. openxlsx (v4.1.4)

To ensure efficient and successful installation of METABOLIC, make sure that all dependencies are properly installed prior to download of the METABOLIC software.

Go Back to the homepage

Installation instructions:

Quick installation:

Anaconda environment

(Contributed by Dr. Daan Speth)
(The original information could be found in issue 27, and here I provided a modified version of installation in conda environment)

The environmental yaml file for setting up a conda environment for METABOLIC v4.0 is provided here:
https://github.com/AnantharamanLab/METABOLIC/blob/master/METABOLIC_v4.0_env.yml

1 Set up the METABOLIC_v4.0 conda environment

conda env create -f /path/to/METABOLIC_v4.0_env.yml
# In the popup after running all the installation in setting up this conda env, it asks you to rewrite the database address of GTDB-Tk;

# Rewrite GTDBTK_DATA_PATH
conda env config vars set GTDBTK_DATA_PATH="/path/to/your/databases/GTDBTK_DB"
# The lastest GTDBTK database file could be found here: https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_data.tar.gz
# You can download this tar.gz file first and then unzip it

2 Activate conda environment

conda activate METABOLIC_v4.0

3 Git clone

# get to the path where you want to set up your METABOLIC_running folder
mkdir /path/to/METABOLIC_running_folder
cd /path/to/METABOLIC_running_folder
git clone https://github.com/AnantharamanLab/METABOLIC.git

4 Run the setup bash script

cd METABOLIC
bash run_to_setup.sh

Note that the shebang "#!/usr/bin/env perl" has been used to find the first "perl" executable in the list of $PATH (which is the default perl in your conda environment, you can call "which perl" to find and see the default perl)

Note that an additional conda package was provided at https://anaconda.org/HCC/metabolic (not necessarily being kept up with the most updated version in our GitHub repo, please see the version date).

Run by Snakemake

(Contributed by Dr. Susheel Bhanu Busi in issue 27)
The yaml file for the environment setting up: https://github.com/AnantharamanLab/METABOLIC/files/5962517/metabolic_environment.txt (extension needs to be changed from ".txt" to ".yml ")

METABOLIC can be run in a snakemake workflow as follows (as an example):

rule metabolic:
    input:
        fa="data/GL_R9_GL11_UP_2_O4.2.1.contigs.fasta",
        reads="data/metabolic_reads.txt"
    output:
        directory("metabolic_output")
    log:
        out="logs/metabolic.out.log",
        err="logs/metabolic.err.log"
    conda:
        os.path.join(ENV_DIR, "metabolic.yaml") # the same as metabolic_environment.yml
    params:
        gtdbtk=config["metabolic"]["db"],
        metabolic=config["metabolic"]["directory"]
    threads:
        config["metabolic"]["threads"]
    message:
        "Running metabolic for all ROCK bins"
    shell:
        "(date && "
        "export GTDBTK_DATA_PATH={params.gtdbtk} && "
        """env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Array::Split && """
        "perl {params.metabolic}/METABOLIC-C.pl -t {threads} -in-gn $(dirname {input.fa}) -r {input.reads} -o {output} && "
        "date) 2> {log.err} > {log.out}"

Notice: The Array::Split installation happens after the conda environment is built. Haven't found a way around this yet, but it seems to be working.
(Notice by METABOLIC author, in the new version of METABOLIC-G and -C, Array::Split is no longer required, and this Perl module was also excluded in the METABOLIC_v4.0 conda yaml file provided by METABOLIC author)

Go Back to the homepage

Full installation:

1 Go to where you want the program to be and clone the github repository by using the following command:

git clone https://github.com/AnantharamanLab/METABOLIC.git

  or click the green button "download ZIP" folder at the top of the github and unzip the downloaded file.
  The perl and R scripts and dependent databases should be kept in the same directory.

NOTE: Before following the next step, make sure your working directory is the directory that was created by the METABOLIC download, that is, the directory containing the main scripts for METABOLIC (METABOLIC- G.pl, METABOLIC-C.pl, etc.).

NOTE: We created a script for easily setting up dependent databases (step 2-8)

We provide a "run_to_setup.sh" script along with the data downloaded from the GitHub for easy setup of dependent databases. This can be run by using the following command:

bash run_to_setup.sh

Once you've run the bash script, it is not necessary to run Step 2-8 described below. While, it is suggested to run this bash script under supervision, some minor problems might occur during the setup procedure and you could rely on the following Step 2-8 to figure it out.

2 METABOLIC requires the KofamKOALA hmm and METABOLIC hmm databases
KofamKOALA website

 2.1 Download KofamKOALA hmm database files:

    mkdir kofam_database  
    cd kofam_database  
    wget -c ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz  
    wget -c ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz  
    gzip -d ko_list.gz  
    tar xzf profiles.tar.gz; rm profiles.tar.gz  
    mv ../All_Module_KO_ids.txt profiles  
    cd profiles  
    cp ../../Accessory_scripts/batch_hmmpress.pl ./  
    perl batch_hmmpress.pl  

 2.2 The METABOLIC hmm database in "METABOLIC_hmm_db.tgz" contains custom hmm files, self-parsed Pfam and TIRGfam files. It needs to be decompressed to the folder "METABOLIC_hmm_db" and stays in the same directory of KofamKOALA hmm database and scripts.

  tar zxvf METABOLIC_hmm_db.tgz

3 METABOLIC uses the "METABOLIC_template_and_database" which contains the hmm result table and KEGG database information. Decompress the METABOLIC_template_and_database.tgz to the folder "METABOLIC_template_and_database" and keep it in the same directory of KofamKOALA hmm database and scripts.

  tar zxvf METABOLIC_template_and_database.tgz

4 This software also contains "Accessory_scripts.gz", which needs to be decompressed before use.

  tar zxvf Accessory_scripts.tgz

5 This software also contains "Motif.tgz", which needs to be decompressed before use.

  tar zxvf Motif.tgz

6 You will download the most recent dbCAN-fam-HMMs.txt into a directory (that is made by you) “dbCAN2”. And parse the dbCAN-HMMdb.txt by "batch_hmmpress_for_dbCAN2_HMMdb.pl".

  mkdir dbCAN2
  cd dbCAN2
  wget http://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-old@UGA/dbCAN-fam-HMMs.txt
  perl ../Accessory_scripts/batch_hmmpress_for_dbCAN2_HMMdb.pl
  cd ../

7 You will download the MEROPS Peptidase Protein Sequences (https://www.ebi.ac.uk/merops/download_list.shtml, No. 3 option). And parse the pepunit.lib by DIAMOND to make the BLASTP database.

  mkdir MEROPS
  cd MEROPS
  wget ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib
  perl ../Accessory_scripts/make_pepunit_db.pl
  cd ../

8 Finally, this software also contains "METABOLIC_test_files.tgz ", which needs to be downloaded from Figshare and decompressed before use. This is a set of test genomes and reads that you can use to test run the program to see if it works correctly before running your real samples.

wget -c https://figshare.com/ndownloader/files/43500597 -O METABOLIC_test_files.tgz
tar zxvf METABOLIC_test_files.tgz
rm *.tgz

Go Back to the homepage