Workflow

Below is an overview of the PICRUSt2 workflow, which includes example commands for processing 16S sequencing data and getting E.C. number and KEGG ortholog (KO) abundances. The E.C. numbers can then be used to calculate MetaCyc pathway abundances and coverages. Note that there are other gene family databases supported which may be more informative (but which cannot be collapsed to pathways by default). See the side-bar for more details on individual commands.

Note that you can type the option -h to get a description of each below script.

The entire pipeline can be run with this command (details):

picrust2_pipeline.py -s study_seqs.fna -i study_seqs.biom -o picrust2_out_pipeline -p 1

If you would like to run each step individually you can also do that using the below commands. Using these commands is useful when you're running into problems using picrust2_pipeline.py and want to isolate an issue or if you only want to re-run part of the PICRUSt2 pipeline.

Place amplicon sequence variants (or OTUs) into reference phylogeny (details)

place_seqs.py -s study_seqs.fna -o placed_seqs.tre -p 1 \
              --intermediate placement_working

Run hidden-state prediction to get 16S copy numbers, E.C. number, and KO abundances per predicted genome (details).

Note that NSTI values will be added to the 16S prediction table (since the -n option was given).

hsp.py -i 16S -t placed_seqs.tre -o marker_nsti_predicted.tsv.gz -p 1 -n

hsp.py -i EC -t placed_seqs.tre -o EC_predicted.tsv.gz -p 1

hsp.py -i KO -t placed_seqs.tre -o KO_predicted.tsv.gz -p 1

Predict E.C. and KO abundances in sequencing samples (adjusts gene family abundances by 16S sequence abundance) (details)

metagenome_pipeline.py -i study_seqs.biom \
                       -m marker_nsti_predicted.tsv.gz \
                       -f EC_predicted.tsv.gz \
                       -o EC_metagenome_out


metagenome_pipeline.py -i study_seqs.biom \
                       -m marker_nsti_predicted.tsv.gz \
                       -f KO_predicted.tsv.gz \
                       -o KO_metagenome_out

Infer MetaCyc pathway abundances and coverages based on predicted E.C. number abundances (details)

pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz \
                    -o pathways_out \
                    --intermediate pathways_working \
                    -p 1

Add descriptions as new column in gene family and pathway abundance tables (details)

add_descriptions.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC \
                    -o EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz

add_descriptions.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO \
                    -o KO_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz

add_descriptions.py -i pathways_out/path_abun_unstrat.tsv.gz -m METACYC \
                    -o pathways_out/path_abun_unstrat_descrip.tsv.gz

Shuffling predictions

An optional additional step is to shuffle the ASV labels in the genome prediction tables (i.e. the outputs of hsp.py). Any analyses based on these shuffled tables can then be compared with analyses based on the actual data to check if there is more signal in the unshuffled data. See here for more details.

Please first check our FAQ if you have any questions about PICRUSt2.

For other general questions and comments about PICRUSt2 please search the PICRUSt google group. If the question has not been previously answered then please make a new thread.

To report a bug or to make a feature request please make a new issue at the top of this page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow

The entire pipeline can be run with this command (details):

Place amplicon sequence variants (or OTUs) into reference phylogeny (details)

Run hidden-state prediction to get 16S copy numbers, E.C. number, and KO abundances per predicted genome (details).

Predict E.C. and KO abundances in sequencing samples (adjusts gene family abundances by 16S sequence abundance) (details)

Infer MetaCyc pathway abundances and coverages based on predicted E.C. number abundances (details)

Add descriptions as new column in gene family and pathway abundance tables (details)

Shuffling predictions

Home

Major bug reports and announcements

Key limitations

Installation

Workflow

Tutorial

QIIME 2 plugin

Validation with paired metagenomes

FAQ

Clone this wiki locally