Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HamronizationNormalizer #81

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

## Unreleased

#### Added `HamronizationNormalizer`
- Removed the `is_hamronized` property for all normalizers and removed `--hamronized` flag for CLI.
- All hamronized results now go through the `HamronizationNormalizer` class.
- HamronizationNormalizer reads a hamronized file line by line, procures input genes, and loads all ARO mapping tables to support hamronized results that combine the outputs from multiple tools and databases.
- For CLI hamronization commands will look like:
```bash
argnorm hamronization -i PATH_TO_INPUT -o PATH_TO_OUTPUT
```

> Note: Updated preprocessing of resfinder genes. Concatenating entries from 'gene_name' and 'reference_accession' in hamronized results to form input genes for HamronizationNormalizer. While this improves ARO mapping accuracy (previously only `gene_symbol` was used and several genes can have the same `gene_symbol`), this simplifies preprocessing of resfinder inputs (if `gene_symbol` is used, two different preprocessing functions are required for `resfinder` and `abricate` for resfinder db).

#### Update `confers_resistance_to()` to use `regulates`, `part_of`, and `participates_in` ARO relationships
Previously, argNorm used the `is_a` ARO relationship along with `confers_resistance_to_drug_class` and `confers_resistance_to_antibiotic` to map ARGs to the drugs they confer resistance to. While this worked well for most genes, some ARGs such as those coding for efflux pumps/proteins (e.g. `ARO:3003548`, `ARO:3000826`, `ARO:3003066`) were previously not mapped to any drugs. This is because none of their superclasses mapped to drugs/antibiotics via `confers_resistance_to_antibiotic` or `confers_resistance_to_drug_class`. However, these genes were related to other ARGs that did map to drugs via the `regulates`, `part_of`, or `participates_in` ARO relationships. argNorm now also utilizes these three relationships to ensure that even if the superclasses (derived using `is_a`) of an ARG don't map to a drug, the gene can be assigned a drug mapping.

Expand Down
54 changes: 23 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,18 +51,18 @@ The `resistance_to_drug_classes` column will contain ARO numbers of the broader
If you use argNorm in a publication, please cite the preprint:
> Ugarcina Perovic S, Ramji V et al. argNorm: Normalization of Antibiotic Resistance Gene Annotations to the Antibiotic Resistance Ontology (ARO). Queensland University of Technology ePrints, 2024. DOI: https://doi.org/10.5204/rep.eprints.252448 [Preprint] (Under review).

## Supported tools and databases
## Supported ARG annotation tools and databases

| ARG database | Tool for ARG annotation |
| ---------------------------------- | ------------------------------------------------------- |
| ARG-ANNOT v5.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
| DeepARG v2 | [DeepARG v1.0.2](https://bench.cs.vt.edu/deeparg) |
| Groot v1.1.2 | [GROOT v1.1.2](https://github.com/will-rowe/groot) |
| MEGARes v3.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
| NCBI Reference Gene Database v3.12 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [AMRFinderPlus v3.10.30](https://github.com/ncbi/amr) |
| ResFinder v4.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [ResFinder v4.0](https://bitbucket.org/genomicepidemiology/resfinder/src/master/) |
| ResFinderFG v2.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
| SARG (reads mode) v3.2.1 | [ARGs-OAP v2.3](https://galaxyproject.org/use/args-oap/) |
| ARG-ANNOT v5.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| DeepARG v2 | [DeepARG v1.0.2](https://bench.cs.vt.edu/deeparg) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| Groot v1.1.2 | [GROOT v1.1.2](https://github.com/will-rowe/groot) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| MEGARes v3.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| NCBI Reference Gene Database v3.12 | [ABRicate v1.0.1](https://github.com/tseemann/abricate), [AMRFinderPlus v3.10.30](https://github.com/ncbi/amr), & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| ResFinder v4.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate), [ResFinder v4.0](https://bitbucket.org/genomicepidemiology/resfinder/src/master/), & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| ResFinderFG v2.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| SARG (reads mode) v3.2.1 | [ARGs-OAP v2.3](https://galaxyproject.org/use/args-oap/) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |

- Note: ARG database and ARG annotation tool versions can change. argNorm is only intended for supported versions listed above.
- Note: the argNorm tool will be periodically updated to support the latest versions of databases and annotation tools if they undergo significant changes.
Expand Down Expand Up @@ -98,7 +98,7 @@ argNorm is readily available in the funcscan pipeline which can be accessed (her
Here is a basic outline of calling argNorm.

```bash
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv] [--hamronized (if hAMRonization used)]
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv]
```

### `tool` (required)
Expand All @@ -109,6 +109,7 @@ The most important ***required positional*** argument is `tool` (see [here](#sup
- `resfinder`
- `amrfinderplus`
- `groot`
- `hamronization`

### I/O (required)
- `-i` or `--input`: path to the annotation result
Expand All @@ -135,31 +136,26 @@ ARG annotation tools can use several ARG databases for annotation. Hence, the `t
| `resfinder` | Not required |
| `amrfinderplus` | Not required |
| `groot` | Any from `groot-argannot`, `groot-resfinder`, `groot-db`, `groot-core-db`, or `groot-card` |

### `--hamronized` (optional)
Use this if the input is hamronized by [hAMRonization](https://github.com/pha4ge/hAMRonization)
| `hamronization` | Not required |

### `-h` or `--help`
Use `argnorm -h` or `argnorm --help` to see available options.

```bash
>argnorm -h
usage: argnorm [-h]
[--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}]
[--hamronized] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot}
usage: argnorm [-h] [--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}

argNorm normalizes ARG annotation results from different tools and databases to the same ontology, namely ARO (Antibiotic Resistance Ontology).

positional arguments:
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot}
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
The tool you used to do ARG annotation.

optional arguments:
options:
-h, --help show this help message and exit
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}
The database you used to do ARG annotation.
--hamronized Use this if the input is hamronized (processed using the hAMRonization tool)
-i INPUT, --input INPUT
The annotation result you have
-o OUTPUT, --output OUTPUT
Expand Down Expand Up @@ -209,23 +205,19 @@ argnorm -h

```
> argnorm -h
usage: argnorm [-h]
[--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg}]
[--hamronized] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus}
usage: argnorm [-h] [--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}

argNorm normalizes ARG annotation results from different tools and databases to the same ontology, namely ARO (Antibiotic Resistance Ontology).

positional arguments:
{argsoap,abricate,deeparg,resfinder,amrfinderplus}
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
The tool you used to do ARG annotation.

optional arguments:
options:
-h, --help show this help message and exit
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg}
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}
The database you used to do ARG annotation.
--hamronized Use this if the input is hamronized (processed using
the hAMRonization tool)
-i INPUT, --input INPUT
The annotation result you have
-o OUTPUT, --output OUTPUT
Expand Down Expand Up @@ -257,10 +249,10 @@ wget https://raw.githubusercontent.com/BigDataBiology/argNorm/main/examples/raw/
Here is a basic outline of most argNorm commands:

```bash
argnorm [tool] -i [original_annotation.tsv] -o [argnorm_result.tsv] [--hamronized]
argnorm [tool] -i [original_annotation.tsv] -o [argnorm_result.tsv] [--db]
```

Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--hamronized` is an option to indicate if the input data is a result of using the [hAMRonization package](https://github.com/pha4ge/hAMRonization). In our example, the input data is not a result of using the hAMRonization package, and so the `--hamronized` option can be omitted.
Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--db` is the ARG databases used along with `tool` to perform annotation. ResFinder does not require a `--db` (argNorm will automatically load up the ResFinder database), however, `--db` is required for the ARG annotation tools `groot` and `abricate`.


To run argNorm on the input data, use this command in your terminal:
Expand Down
5 changes: 1 addition & 4 deletions argnorm/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ def main():
'namely ARO (Antibiotic Resistance Ontology).'),
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('tool', type=str.lower,
choices=['argsoap', 'abricate', 'deeparg', 'resfinder', 'amrfinderplus', 'groot'],
choices=['argsoap', 'abricate', 'deeparg', 'resfinder', 'amrfinderplus', 'groot', 'hamronization'],
help='The tool you used to do ARG annotation.')
parser.add_argument('--db', type=str.lower,
choices=['sarg',
Expand All @@ -29,8 +29,6 @@ def main():
'groot-card'
],
help='The database you used to do ARG annotation.')
parser.add_argument('--hamronized', action='store_true',
help='Use this if the input is hamronized (processed using the hAMRonization tool)')
parser.add_argument('-i', '--input', type=str,
help='The annotation result you have')
parser.add_argument('-o', '--output', type=str,
Expand All @@ -43,7 +41,6 @@ def main():
result = normalize(args.input,
tool=args.tool,
database=args.db,
is_hamronized=args.hamronized
)

prop_unmapped = ((result.ARO == 'ARO:nan').sum() + result.ARO.isna().sum()) / result.shape[0]
Expand Down
9 changes: 5 additions & 4 deletions argnorm/normalize.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from .normalizers import ARGSOAPNormalizer, \
DeepARGNormalizer, AbricateNormalizer, ResFinderNormalizer, AMRFinderPlusNormalizer, GrootNormalizer
DeepARGNormalizer, AbricateNormalizer, ResFinderNormalizer, AMRFinderPlusNormalizer, GrootNormalizer, HamronizationNormalizer

def normalize(ifile, tool : str, database : str, is_hamronized : bool):
def normalize(ifile, tool : str, database : str):
'''Normalize ARG tables

Parameters
Expand All @@ -20,11 +20,12 @@ def normalize(ifile, tool : str, database : str, is_hamronized : bool):
'argsoap': ARGSOAPNormalizer,
'deeparg': DeepARGNormalizer,
'resfinder': ResFinderNormalizer,
'groot': GrootNormalizer
'groot': GrootNormalizer,
'hamronization': HamronizationNormalizer
}.get(tool)

if normalizer is None:
raise ValueError('Please specify a correct tool name.')
norm = normalizer(database=database, is_hamronized=is_hamronized)
norm = normalizer(database=database)
return norm.run(ifile)

Loading