A tool for representing genomic potential and transcriptomic expression into KEGG pathways.
- Features
- Installation
- Running KEGGCharter
- Testing KEGGCharter
- Outputs
- Arguments for KEGGCharter
- Referencing KEGGCharter
KEGGCharter is a user-friendly implementation of KEGG API and Pathway functionalities. It allows for:
- Conversion of KEGG IDs to KEGG Orthologs (KO) and of KO to EC numbers
- Representation of the metabolic potential of the main taxa in KEGG metabolic maps (up to the top 10, each distinguished by its own colour)
- Representation of differential expression between samples in KEGG metabolic maps (the collective sum of each function will be represented)
KEGGCharter can be easily installed with Bioconda.
conda install -c conda-forge -c bioconda keggcharter
To run KEGGCharter, an input file must be supplied - see Testing KEGGCharter
section - and columns
with genomic and/or transcriptomic information, as well as one column with either KEGG IDs, KOs or EC numbers, must be
present in the file and specified through the command line.
python kegg_charter.py -f input_file.xlsx -o output_folder -mgc mg_column1,mg_column2 -mtc mt_column1,mt_column2 ...
An example input file is available here. This is one output of MOSCA, which can be directly inputted to KEGGCharter to obtain metabolic representations by running:
kegg_charter.py -f MOSCA_Entry_Report.xlsx -gcol mg -tcol mt_0.01a_normalized,mt_1a_normalized,mt_100a_normalized,mt_0.01b_normalized,mt_1b_normalized,mt_100b_normalized,mt_0.01c_normalized,mt_1c_normalized,mt_100c_normalized -keggc "Cross-reference (KEGG)" -o test_keggcharter -tc "Taxonomic lineage (GENUS)"
Just make sure MOSCA_Entry_Report.xlsx
is in the present folder, or indicate the path to it. This command will create
representations for all 252 default maps of KEGGCharter. If you want to represent for less or more, run with the --metabolic-maps
parameter to indicate to KEGGCharter what maps to run (comma separated).
KEGGCharter needs KGMLs and EC numbers to boxes relations, which it will automatically retrieve for every map inputted. This might take some time, but you only need to run it once.
Default directory for storing these files is the folder containing the kegg_charter.py
script, but it can be customized
with the --resources-directory
parameter.
KEGGCharter produces a table from the inputed data with two new columns - KO (KEGG Charter) and EC number (KEGG Charter) - containing the results of conversion of KEGG IDs to KOs and KOs to EC numbers, respectively. This file is saved as KEGGCharter_results
in the output directory.
KEGGCharter then represents this information in KEGG metabolic maps. If information is available as result of (meta)genomics analysis, KEGGCharter will localize the boxes whose functions are present in the organisms' genomes, mapping their genomic potential. If (meta)transcriptomics data is available, KEGGCharter will consider the sample as a whole, measuring gene expression and performing a multi-sample comparison for each function in the metabolic maps.
- maps with genomic information are identified with the prefix "potential_" from genomic potential (figure 1).
Figure 1 - KEGG metabolic map of methane metabolism, with identified taxa for each function from a simulated dataset.
- maps with transcriptomic information are identified with the prefix "differential_" from differential expression (figure 2).
Figure 2 - KEGG metabolic map of methane metabolism, with differential analysis of quantified expression for each function from a simulated dataset.
KEGGCharter provides several options for customizing its workflow.
usage: kegg_charter.py [-h] [-o OUTPUT] [--tsv] [-t THREADS]
[-mm METABOLIC_MAPS] [-gcol GENOMIC_COLUMNS]
[-tcol TRANSCRIPTOMIC_COLUMNS] [-tls TAXA_LIST]
[-not NUMBER_OF_TAXA] [-keggc KEGG_COLUMN]
[-koc KO_COLUMN] [-ecc EC_COLUMN] [--resume] [-v] -f
FILE -tc TAXA_COLUMN [--show-available-maps]
KEGGCharter - A tool for representing genomic potential and transcriptomic
expression into KEGG pathways
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output directory
--tsv Results will be outputed in TSV format (and not
EXCEL).
-mm METABOLIC_MAPS, --metabolic-maps METABOLIC_MAPS
IDs of metabolic maps to output
-gcol GENOMIC_COLUMNS, --genomic-columns GENOMIC_COLUMNS
Names of columns with genomic identification
-tcol TRANSCRIPTOMIC_COLUMNS, --transcriptomic-columns TRANSCRIPTOMIC_COLUMNS
Names of columns with transcriptomics quantification
-tls TAXA_LIST, --taxa-list TAXA_LIST
List of taxa to represent in genomic potential charts
(comma separated)
-not NUMBER_OF_TAXA, --number-of-taxa NUMBER_OF_TAXA
Number of taxa to represent in genomic potential
charts (comma separated)
-keggc KEGG_COLUMN, --kegg-column KEGG_COLUMN
Column with KEGG IDs.
-koc KO_COLUMN, --ko-column KO_COLUMN
Column with KOs.
-ecc EC_COLUMN, --ec-column EC_COLUMN
Column with EC numbers.
--resume Data inputed has already been analyzed by KEGGCharter.
-v, --version show program's version number and exit
-tc TAXA_COLUMN, --taxa-column TAXA_COLUMN
Column with the taxa designations to represent with
KEGGChart
required named arguments:
-f FILE, --file FILE TSV or EXCEL table with information to chart
Special functions:
--show-available-maps
Outputs KEGG maps IDs and descriptions to the console
(so you may pick the ones you want!)
If you use KEGGCharter, please cite its publication.