Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Medication ontology - graph restriction update #1331

Open
arschat opened this issue Nov 25, 2024 · 2 comments
Open

Medication ontology - graph restriction update #1331

arschat opened this issue Nov 25, 2024 · 2 comments
Labels

Comments

@arschat
Copy link
Collaborator

arschat commented Nov 25, 2024

Lung Tier 2 medication in the last month field should have values from the DRug ONtology as stated in the description

Please indicate the last known therapy, as drug categories from the Drug Ontology (DRON), administered to the patient within the last month prior to sample collection. If this information is not shareable due to data privacy restrictions, please indiciate "not shareable".

For that reason we updated the medical_history schema (v7.0.0) to use the dron ontology and specified the graph restriction to the children of material entity.

However, for projects like #1316 we had to add medications (Oral contraceptive progestin) that did not match any DRON term from the subgraph we've added.

For that reason we decided to check the mapping of the medications that we've previously wrangled.

  1. Pull medication data from ingest
  2. Wrangle the data to clean the values
  3. Use ZOOMA to get matches from all OLS
  4. Decide which ontology is more comprehensive for us to include

Note: Since lung is asking for DRON, I believe we should have it anyway in the graph restriction. However, we can also add another ontology in the graph and suggest the usage of one of those.

from hca_ingest.api.ingestapi import IngestApi

query  = [{
        "field": "content.medical_history.medication",
        "operator": "REGEX",
        "value": ".*"
}]

api = IngestApi(url="https://api.ingest.archive.data.humancellatlas.org/")
api.set_token(f"Bearer {<token>}")
response = api.post('https://api.ingest.archive.data.humancellatlas.org/biomaterials/query?operator=AND&size=535', json=query)

med=[]
for donor in response.json()['_embedded']['biomaterials']:
    med.append(donor['content']['medical_history']['medication'])

set(med)
@arschat arschat changed the title Medication ontology Medication ontology - graph restriction update Nov 25, 2024
@arschat arschat added the HCA label Nov 25, 2024
@arschat
Copy link
Collaborator Author

arschat commented Nov 25, 2024

@idazucchi worked on this, generating this medication_ontology.xlsx spreadsheet

The most important tab is new mapping.
I've checked 3 possible ontologies: DRON, CHEBI and NCIT
I've highlighted in orange the drugs where I'm not sure of the match, and in yellow the matches that are good but don't fall in the Pharmacologic Substance branch for NCIT. Terms in purple could also be described under treatment

  1. DRON - all the best matches are imported terms from CHEBI, so after the first 30 drugs I dropped it
  2. CHEBI - the matches are good but it's structure based, so it doesn't have more broad terms like Antiretroviral Therapy
  3. NCIT seems like the best match so far, the only exceptions are some terms that are not in the Pharmacologic Substance. There are also a couple that could fall into treatment, like high-dose intravenous immunoglobulin

Before we switch ontology we should make sure that we can at least map all the terms in the sheet
It's not the full list of drugs present in ingest but at least it's a varied selection

@arschat
Copy link
Collaborator Author

arschat commented Feb 3, 2025

PR with NCIT update

@arschat arschat closed this as completed Feb 3, 2025
@arschat arschat reopened this Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant