Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated manual curation for misannotated genes #55

Merged
merged 4 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@

### Manual curation
- argannot_curation: (Tet)tetH:EF460464:6286-7839:1554 was incorrectly annotated as ARO:3004797 which is a beta-lactamase due to a loose RGI hit. This was manually curated to ARO:3000175.
- deeparg, megares, resfinderfg & sarg curation: ARO:3004445 -> ARO:3005440, this was due to a change in the ARO and the ARO number for the RSA2 gene changing. **db_harmonisation must change to take this into account**

#### Incorrectly curated genes.
- Previously, these were directly mapped to drug classes. Correct parent ARO term has now been given.
- resfinder_curation: grdA_1_QJX10702 -> 3007380 & EstDL136_1_JN242251 -> 3000557
- megares_curation: MEG_2865|Drugs|Phenicol|Chloramphenicol_hydrolase|ESTD -> 3000557

## 0.4.0 - 10 June

Expand Down
25 changes: 13 additions & 12 deletions argnorm/data/manual_curation/deeparg_curation.tsv
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
Original ID ARO Gene Name in CARD Description
CP000034.1.gene2198.p01|FEATURES|ompF|multidrug|ompF 3000265 porin OmpF
AAB08925|FEATURES|tetU|tetracycline|tetU 3004650 tet(U)
CP000647.1.gene2517.p01|FEATURES|ompF|multidrug|ompF 3000265 porin OmpF
gi:504720116:ref:WP_014907218.1:|FEATURES|ompF|multidrug|ompF 3000265 porin OmpF
AM180355.1.gene2260.p01|FEATURES|ermC|MLS|ermC 3000250 ErmC
gi:501976562:ref:WP_012681429.1:|FEATURES|cystathionine_beta-lyase_patB|unclassified|cystathionine_beta-lyase_patB 3000025 patB
gi:447201629:ref:WP_001278885.1:|FEATURES|cob(I)alamin_adenolsyltransferase|unclassified|cob(I)alamin_adenolsyltransferase 0010004 RND type drug efflux https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2696896/
gi:489831097:ref:WP_003734834.1:|FEATURES|cystathionine_beta-lyase_patB|unclassified|cystathionine_beta-lyase_patB 3000025 patB
gi:685904080:ref:WP_031644138.1:|FEATURES|cystathionine_beta-lyase_patB|unclassified|cystathionine_beta-lyase_patB 3000025 patB
NP_253005.1|FEATURES|MvaT|multidrug|MvaT 3004069 MvaT
NP_417544.5|FEATURES|patA|fluoroquinolone|patA 3000024 patA
NP_416340.1|FEATURES|mgrB|multidrug|mgrB 3003820 mgrB
CP000034.1.gene2198.p01|FEATURES|ompF|multidrug|ompF 3000265 porin OmpF
AAB08925|FEATURES|tetU|tetracycline|tetU 3004650 tet(U)
CP000647.1.gene2517.p01|FEATURES|ompF|multidrug|ompF 3000265 porin OmpF
gi:504720116:ref:WP_014907218.1:|FEATURES|ompF|multidrug|ompF 3000265 porin OmpF
AM180355.1.gene2260.p01|FEATURES|ermC|MLS|ermC 3000250 ErmC
gi:501976562:ref:WP_012681429.1:|FEATURES|cystathionine_beta-lyase_patB|unclassified|cystathionine_beta-lyase_patB 3000025 patB
gi:447201629:ref:WP_001278885.1:|FEATURES|cob(I)alamin_adenolsyltransferase|unclassified|cob(I)alamin_adenolsyltransferase 10004 RND type drug efflux https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2696896/
gi:489831097:ref:WP_003734834.1:|FEATURES|cystathionine_beta-lyase_patB|unclassified|cystathionine_beta-lyase_patB 3000025 patB
gi:685904080:ref:WP_031644138.1:|FEATURES|cystathionine_beta-lyase_patB|unclassified|cystathionine_beta-lyase_patB 3000025 patB
NP_253005.1|FEATURES|MvaT|multidrug|MvaT 3004069 MvaT
NP_417544.5|FEATURES|patA|fluoroquinolone|patA 3000024 patA
NP_416340.1|FEATURES|mgrB|multidrug|mgrB 3003820 mgrB
AUW34359.1|FEATURES|RSA-2|beta-lactam|RSA-2 3005440 RSA2 beta-lactamase ARO number of RSA2 had been changed.
3 changes: 2 additions & 1 deletion argnorm/data/manual_curation/megares_curation.tsv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Original ID ARO Description
MEG_8443|Drugs|Pleuromutilin|Pleuromutilin-resistant_23S_rRNA_mutation|P23S|RequiresSNPConfirmation 3005083
MEG_4060|Metals|Multi-metal_resistance|Multi-metal_resistance_protein|MREA Metal resistance. No mapping
MEG_2865|Drugs|Phenicol|Chloramphenicol_hydrolase|ESTD 3000387
MEG_2865|Drugs|Phenicol|Chloramphenicol_hydrolase|ESTD 3000557
MEG_1732|Drugs|Lipopeptides|Daptomycin-resistant_mutant|CLS|RequiresSNPConfirmation 3003092
MEG_2933|Drugs|Mycobacterium_tuberculosis-specific_Drug|Para-aminosalicylic_acid_resistant_mutant|FOLC|RequiresSNPConfirmation 3004157
MEG_8700|Multi-compound|Biocide_and_metal_resistance|Biocide_and_metal_ABC_efflux_pumps|SITABCD Metal + Biocide resistance. No mapping
Expand Down Expand Up @@ -86,3 +86,4 @@ MEG_2429|Drugs|betalactams|Class_A_betalactamases|CTX 3001943
MEG_5148|Drugs|betalactams|Class_D_betalactamases|OXA 3001555
MEG_7604|Drugs|Glycopeptides|VanD-type_accessory_protein|VANYD 3002957
MEG_8253|Drugs|MLS|Macrolide-resistant_23S_rRNA_mutation|MLS23S|RequiresSNPConfirmation 3004836
MEG_6148|Drugs|betalactams|Class_A_betalactamases|RSA 3005440 ARO number of RSA2 had been changed.
6 changes: 3 additions & 3 deletions argnorm/data/manual_curation/resfinder_curation.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ VanC2XY_1_EU151754 glycopeptide resistance gene cluster VanC 3000246 https://www
VanHAX_PT_1_DQ018710 glycopeptide resistance gene cluster VanA 3000236 https://www.ncbi.nlm.nih.gov/nuccore/DQ018710.1 5109-7715 Part of VanA cluster (ARO:3000236)
VanHAX_PA_1_DQ018711 glycopeptide resistance gene cluster VanA 3000236 https://www.ncbi.nlm.nih.gov/nuccore/DQ018711.1?report=fasta 3168-5750 Part of VanA cluster (ARO:3000236)
VanHAX_PT_2_AY926880 glycopeptide resistance gene cluster VanA 3000236 https://www.ncbi.nlm.nih.gov/nuccore/AY926880.2?report=fasta 2771-5377 Part of VanA cluster (ARO:3000236)
dldHA2X_1_AL939117 D-Ala-D-Ala ligase 3003970 https://www.ncbi.nlm.nih.gov/nuccore/AL939117.1 53343-56013
dldHA2X_1_AL939117 D-Ala-D-Ala ligase 3003970 https://www.ncbi.nlm.nih.gov/nuccore/AL939117.1 53343-56013
VanHBX_1_AF192329 glycopeptide resistance gene cluster VanB 3000238 https://www.ncbi.nlm.nih.gov/nuccore/AF192329 27871-30477 Part of VanB cluster (ARO:3000238)
VanHBX_2_U35369 glycopeptide resistance gene cluster VanB 3000238 https://www.ncbi.nlm.nih.gov/nuccore/U35369.1?report=fasta 4007-6613 "Part of VanB cluster (ARO:3000238). Contains ARO:3002943, ARO:3002950"
VanC4XY_1_EU151752 glycopeptide resistance gene cluster VanC 3000246 https://www.ncbi.nlm.nih.gov/nuccore/EU151752.1?report=fasta 29-1650 Part of VanC cluster (ARO:3000246)
Expand Down Expand Up @@ -45,6 +45,6 @@ qepA1_1_AB263754 QepA2 3004103 Reverse complement in resfinder db.
tet(43)_1_GQ244501 tet(43) 3000573 Reverse complement in resfinder db.
blaSPG-1_1_KP109680 SPG-1 3003720
blaBIM-1_1_CP016446 BlaB 3004201
grdA_1_QJX10702 3007382 Parent ARO mapping
grdA_1_QJX10702 3007380 Parent ARO mapping
aac(3)-I_1_AJ877225 AAC(3)-I 3007384
EstDL136_1_JN242251 3000387 Parent ARO mapping
EstDL136_1_JN242251 3000557 Parent ARO mapping
6 changes: 4 additions & 2 deletions argnorm/data/manual_curation/resfinderfg_curation.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
Original ID ARO Gene Name in CARD
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629588.1|pediatric_fecal_sample|CYC 3003970 D-Ala-D-Ala ligase
Original ID ARO Gene Name in CARD Description
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629588.1|pediatric_fecal_sample|CYC 3003970 D-Ala-D-Ala ligase
Beta-lactamase OXA-1|KU544700.1|sewage|CAZ 3005440 RSA2 beta-lactamase ARO number of RSA2 had been changed.
Beta-lactamase OXA-1|MG739504.1|river|AMP 3005440 RSA2 beta-lactamase ARO number of RSA2 had been changed.
7 changes: 4 additions & 3 deletions argnorm/data/manual_curation/sarg_curation.tsv
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Original ID ARO Gene Name in CARD
gb|AAG57600.1|ARO:3000318|mphB 3000318 mphB
AM180355.1.gene2260.p01 3000250 ErmC
Original ID ARO Gene Name in CARD Description
gb|AAG57600.1|ARO:3000318|mphB 3000318 mphB
AM180355.1.gene2260.p01 3000250 ErmC
gb|AUW34359.1|ARO:3004445|RSA-2 3005440 RSA2 beta-lactamase ARO number of RSA2 had been changed.
Loading