-
Notifications
You must be signed in to change notification settings - Fork 0
HumaNn
Humann is a computational profiler allowing users to estimate the abundance of microbial metabolic pathways and gene families from metagenomic or metatranscriptomic sequencing data.
Complete documentation for Humann3 can be found in the Biokakery wiki
Humann3 is installed on the QIB HPC system in several version. You can list the currently available packages using the NBI-slurm utility:
source package nbi-slurm
shelf humann
If you don't find the specific version of the tool you want to use, you can install Humann for yourself using the following instructions
The reference databases are downloaded and shared for everyone to use by the core bioinformatics:
- MPA
- /qib/platforms/Informatics/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212
- HUMANN not
- /qib/platforms/Informatics/databases/humann_db/2023/chocophlan
- HUMANN prot
- /qib/platforms/Informatics/databases/humann_db/uniref
If you can't find your database of interest, don't hesitate to contact us, and we'll download it for you!
Humann can be run after QC and human read removal (see this tutorial) on your fastq files as follows:
source package e59dcdcb-efe4-4b6c-90fc-f35899b7e1a2 # Humann3.8
MPA="/qib/platforms/Informatics/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212"
HUMANN_NUC="/qib/platforms/Informatics/databases/humann_db/2023/chocophlan"
HUMANN_PROT="/qib/platforms/Informatics/databases/humann_db/uniref"
humann --input ${YOURFILE.fastq} --output ${YOUROUTDIR} --metaphlan-options "--offline --bowtie2db $MPA" --nucleotide-database $HUMANN_NUC --protein-database $HUMANN_PROT
When HUMAnN is run from any input type, three main output files will be created:
- $SAMPLE_genefamilies.tsv : contains the stratified output for gene family counts
- $SAMPLE_pathabundance.tsv : contains the stratified output for pathway family counts
- $SAMPLE_pathcoverage.tsv : pathway coverage file
Humann provides a utility script to normalize the counts to relative abundance or "copies per million" (CPM) units, that can be run on the genefamilies or the pathabundance output file:
humann_renorm_table --input ${HUMANN_TABLE} --output cpm_${HUMANN_TABLE} --units cpm
You can merge the per-sample gene family abundance/pathway abundance outputs into a single table using the script humann_join_tables:
humann_join_tables -i ${HUMANN_DIR} -o ${OUTPUT_DIR} --file_name genefamilies
An example of analysing the output of Humann3 using R :