Skip to content

Commit

Permalink
Merge pull request #2157 from merenlab/reaction-network-updates
Browse files Browse the repository at this point in the history
Reaction network updates
  • Loading branch information
semiller10 authored Oct 28, 2023
2 parents 9227669 + 9aed9f5 commit 2cc9ff8
Show file tree
Hide file tree
Showing 7 changed files with 779 additions and 301 deletions.
815 changes: 631 additions & 184 deletions anvio/biochemistry/reactionnetwork.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion anvio/docs/artifacts/reaction-network-json.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
This artifact represents **a JSON-formatted file derived from a %(reaction-network)s**.

The program, %(anvi-get-metabolic-model-file)s, produces this file from the %(reaction-network)s stored in a %(contigs-db)s. The genes, reactions, and metabolites predicted to be involved in metabolism can be inspected in this file, which is formatted for compatability with software used for flux balance analysis, such as [COBRApy](https://opencobra.github.io/cobrapy/).
The program, %(anvi-get-metabolic-model-file)s, produces this file from the %(reaction-network)s stored in a %(contigs-db)s or %(pan-db)s. The genes, reactions, and metabolites predicted to be involved in metabolism can be inspected in this file, which is formatted for compatability with software used for flux balance analysis, such as [COBRApy](https://opencobra.github.io/cobrapy/).

%(anvi-get-metabolic-model-file)s includes an "objective function" as the first entry of the "reactions" section of the file, a prerequisite for flux balance analysis. The objective function represents the biomass composition of metabolites in the ["core metabolism" of *E. coli*](http://bigg.ucsd.edu/models/e_coli_core).
6 changes: 3 additions & 3 deletions anvio/docs/artifacts/reaction-network.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
This artifact represents **the metabolic reaction network stored in a %(contigs-db)s by %(anvi-reaction-network)s.**
This artifact represents **the metabolic reaction network stored in a %(contigs-db)s or a %(pan-db)s by %(anvi-reaction-network)s.**

The program, %(anvi-reaction-network)s, generates a reaction network from genes encoding enzymes in the %(contigs-db)s. The reaction network represents biochemical reactions and the constituent metabolites predicted from the genome. The program relies upon [KEGG Orthology (KO)](https://www.genome.jp/kegg/ko.html) annotations of protein-coding genes and reference data in the [ModelSEED Biochemistry database](https://github.com/ModelSEED/ModelSEEDDatabase), and is therefore subject to all the limitations thereof, including incomplete annotation of genes with protein orthologs and imprecise knowledge of the reactions catalyzed by enzymes.
The program, %(anvi-reaction-network)s, generates a reaction network from genes encoding enzymes in the %(contigs-db)s or from gene clusters with consensus enzyme annotations in the %(pan-db)s. The reaction network represents biochemical reactions and the constituent metabolites predicted from the genome or pangenome. The program relies upon [KEGG Orthology (KO)](https://www.genome.jp/kegg/ko.html) annotations of protein-coding genes and reference data in the [ModelSEED Biochemistry database](https://github.com/ModelSEED/ModelSEEDDatabase), and is therefore subject to all the limitations thereof, including incomplete annotation of genes with protein orthologs and imprecise knowledge of the reactions catalyzed by enzymes.

The representation of the reaction network in two tables of the %(contigs-db)s, `gene_function_reactions` and `gene_function_metabolites`, is generalizable to other sources of metabolic data, linking genes to predicted functional orthologs and the associated reactions and metabolites. This data can be exported to a JSON-formatted file by %(anvi-get-metabolic-model-file)s for inspection and metabolic model analyses.
The representation of the reaction network in two tables of the %(contigs-db)s, `gene_function_reactions` and `gene_function_metabolites`, is generalizable to other sources of metabolic data, linking genes to predicted functional orthologs and the associated reactions and metabolites. Reaction and metabolite data are likewise stored in the identically formatted tables, `gene_cluster_function_reactions` and `gene_cluster_function_metabolites`, in the %(pan-db)s. This data can be exported to a JSON-formatted file by %(anvi-get-metabolic-model-file)s for inspection and metabolic model analyses.
50 changes: 40 additions & 10 deletions anvio/docs/programs/anvi-get-metabolic-model-file.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,60 @@
This program **exports a metabolic %(reaction-network)s from a %(contigs-db)s to a %(reaction-network-json)s file** suitable for inspection and flux balance analysis.
This program **exports a metabolic %(reaction-network)s from a %(contigs-db)s OR a %(pan-db)s and %(genomes-storage-db)s to a %(reaction-network-json)s file** formatted for flux balance analysis.

The required input to this program is a %(contigs-db)s in which a %(reaction-network)s has been stored by %(anvi-reaction-network)s.
The required input to this program is a %(contigs-db)s OR a %(pan-db)s in which a %(reaction-network)s has been stored by %(anvi-reaction-network)s. The %(pan-db)s must be accompanied by a %(genomes-storage-db)s input.

The %(reaction-network-json)s file output contains sections on the metabolites, reactions, and genes constituting the %(reaction-network)s that had been predicted from the genome. An "objective function" representing the biomass composition of metabolites in the ["core metabolism" of *E. coli*](http://bigg.ucsd.edu/models/e_coli_core) is automatically added as the first entry in the "reactions" section of the file and can be deleted as needed. An objective function is needed for flux balance analysis.
The %(reaction-network-json)s file output contains sections on the metabolites, reactions, and genes (or gene clusters) constituting the %(reaction-network)s that had been predicted from the genome (or pangenome). An "objective function" representing the biomass composition of metabolites in the ["core metabolism" of *E. coli*](http://bigg.ucsd.edu/models/e_coli_core) is automatically added as the first entry in the "reactions" section of the file and can be deleted as needed. An objective function is needed for flux balance analysis.

## Usage

%(anvi-get-metabolic-model-file)s requires a %(contigs-db)s as input and the path to an output %(reaction-network-json)s file.
%(anvi-get-metabolic-model-file)s requires a %(contigs-db)s OR a %(pan-db)s and %(genomes-storage-db)s as input, plus the path to an output %(reaction-network-json)s file.

{{ codestart }}
anvi-get-metabolic-model-file -c %(contigs-db)s \
anvi-get-metabolic-model-file -c /path/to/contigs-db \
-o /path/to/ouput.json
{{ codestop }}

An existing file at the target output location must be explicitly overwritten with the `-W` flag.
{{ codestart }}
anvi-get-metabolic-model-file -p /path/to/pan-db \
-g /path/to/genomes-storage-db \
-o /path/to/output.json
{{ codestop }}

An existing file at the target output location must be explicitly overwritten with the flag, `--overwrite-output-destinations`.

{{ codestart }}
anvi-get-metabolic-model-file -c %(contigs-db)s \
anvi-get-metabolic-model-file -c /path/to/contigs-db \
-o /path/to/output.json \
-W
--overwrite-output-destinations
{{ codestop }}

The flag, `--remove-missing-objective-metabolites` must be used to remove metabolites in the *E. coli* core biomass objective function from the output file if the metabolites are not produced or consumed by the predicted %(reaction-network)s. [COBRApy](https://opencobra.github.io/cobrapy/), for instance, cannot load the JSON file if metabolites in the objective function are missing from the genomic model.
The flag, `--remove-missing-objective-metabolites` must be used to remove metabolites in the *E. coli* core biomass objective function from the %(reaction-network-json)s file if the metabolites are not produced or consumed by the predicted %(reaction-network)s. [COBRApy](https://opencobra.github.io/cobrapy/), for instance, cannot load the JSON file if metabolites in the objective function are missing from the model.

{{ codestart }}
anvi-get-metabolic-model-file -c %(contigs-db)s \
anvi-get-metabolic-model-file -c /path/to/contigs-db \
-o /path/to/output.json \
--remove-missing-objective-metabolites
{{ codestop }}

It is possible that the gene KO annotations used to construct the stored reaction network have since been changed in the %(contigs-db)s or the %(genomes-storage-db)s. By default, without using the flag, `--ignore-changed-gene-annotations`, this program checks that the set of gene KO annotations that is currently stored was also that used in construction of the %(reaction-network)s, and raises an error if this is not the case. Use of this flag ignores that check, permitting the set of gene annotations to have changed since creation of the network.

{{ codestart }}
anvi-get-metabolic-model-file -p /path/to/contigs-db \
-o /path/to/output.json \
--ignore-changed-gene-annotations
{{ codestop }}

For a pangenomic network, the option `--record-genomes` determines which additional information is added to the output %(reaction-network-json)s file regarding genome membership. By default, genome names are recorded for gene clusters and reactions, which is equivalent to `--record-genomes cluster reaction`. 'cluster' records in the 'notes' section of each 'gene' (cluster) entry in the JSON file which genomes are part of the cluster. 'reaction' and 'metabolite', respectively, record the genomes predicted to encode enzymes associated with reaction and metabolite entries. The arguments, 'cluster', 'reaction', and 'metabolite', are valid, and are all used in the following example.

{{ codestart }}
anvi-get-metabolic-model-file -p /path/to/pan-db \
-g /path/to/genomes-storage-db \
--record-genomes cluster reaction metabolite
{{ codestop }}

The use of `--record-genomes` as a flag without any arguments prevents genome membership from being recorded at all in the %(reaction-network-json)s file.

{{ codestart }}
anvi-get-metabolic-model-file -p /path/to/pan-db \
-g /path/to/genomes-storage-db \
--record-genomes
{{ codestop }}
22 changes: 16 additions & 6 deletions anvio/docs/programs/anvi-reaction-network.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
This program **stores a metabolic %(reaction-network)s in a %(contigs-db)s.**
This program **stores a metabolic %(reaction-network)s in a %(contigs-db)s or %(pan-db)s.**

The network consists of data on biochemical reactions predicted to be encoded by the genome, referencing the [KEGG Orthology (KO)](https://www.genome.jp/kegg/ko.html) and [ModelSEED Biochemistry](https://github.com/ModelSEED/ModelSEEDDatabase) databases.
The network consists of data on biochemical reactions predicted to be encoded by the genome or pangenome, referencing the [KEGG Orthology (KO)](https://www.genome.jp/kegg/ko.html) and [ModelSEED Biochemistry](https://github.com/ModelSEED/ModelSEEDDatabase) databases.

Information on the predicted reactions and the involved metabolites are stored in two tables of the %(contigs-db)s. The program, %(anvi-get-metabolic-model-file)s, can be used to export the %(reaction-network)s from the database to a %(reaction-network-json)s file suitable for inspection and flux balance analysis.
Information on the predicted reactions and the involved metabolites are stored in two tables of the %(contigs-db)s or %(pan-db)s. The program, %(anvi-get-metabolic-model-file)s, can be used to export the %(reaction-network)s from the database to a %(reaction-network-json)s file formatted for flux balance analysis.

## Usage

%(anvi-reaction-network)s takes a %(contigs-db)s as required input. Genes stored within the database must have KO protein annotations, which can be assigned by %(anvi-run-kegg-kofams)s.
%(anvi-reaction-network)s takes a either a %(contigs-db)s OR a %(pan-db)s and %(genomes-storage-db)s as required input. Genes stored within the %(contigs-db)s or %(genomes-storage-db)s must have KO protein annotations, which can be assigned by %(anvi-run-kegg-kofams)s.

The KO and ModelSEED Biochemistry databases must be set up and available to the program. By default, these are expected to be set up in default anvi'o data directories. %(anvi-setup-kegg-data)s and %(anvi-setup-modelseed-database)s must be run to set up these databases.

Expand All @@ -17,11 +17,21 @@ anvi-reaction-network -c /path/to/contigs-db
Custom locations for the reference databases can be provided with the flags, `--ko-dir` and `--modelseed-dir`.

{{ codestart }}
anvi-reaction-network -c /path/to/contigs-db --ko-dir /path/to/set-up/ko-dir --modelseed-dir /path/to/set-up/modelseed-dir
anvi-reaction-network -c /path/to/contigs-db \
--ko-dir /path/to/set-up/ko-dir \
--modelseed-dir /path/to/set-up/modelseed-dir
{{ codestop }}

If a %(contigs-db)s already contains a %(reaction-network)s from a previous run of this program, the flag `--overwrite-existing-network` can overwrite the existing network with a new one. For example, if %(anvi-run-kegg-kofams)s is run again on a database using a newer version of KEGG, then %(anvi-reaction-network)s should be rerun to update the %(reaction-network)s derived from the KO annotations.

{{ codestart }}
anvi-reaction-network -c /path/to/contigs-db --overwrite-existing-network
anvi-reaction-network -c /path/to/contigs-db \
--overwrite-existing-network
{{ codestop }}

A %(reaction-network)s can also be generated from consensus KO annotations of gene clusters. This can be used to understand the conservation or divergence of parts of the metabolic network between organisms in the pangenome.

{{ codestart }}
anvi-reaction-network -p /path/to/pan-db \
-g /path/to/genomes-storage-db
{{ codestop }}
Loading

0 comments on commit 2cc9ff8

Please sign in to comment.