Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free text info from DrugBank #369

Open
dkoslicki opened this issue Feb 21, 2024 · 6 comments
Open

Free text info from DrugBank #369

dkoslicki opened this issue Feb 21, 2024 · 6 comments
Assignees

Comments

@dkoslicki
Copy link
Member

DrugBank has a bunch of extra information on their website; stuff like mechanisms of action:
image (7)

I'm wondering if: a) this information exists in some downloadable part of DrugBank or b) if there were particular barriers to including this info when DrugBank was ETL'd for KG2.

@chunyuma
Copy link
Contributor

chunyuma commented Mar 9, 2024

Hi @dkoslicki, I think you are able to extract this information from this downloadable XML file, which needs the license to download.

@dkoslicki
Copy link
Member Author

Thanks @chunyuma , Mohsen and I found this and are undergoing the process of NER, identifier extraction, and alignment from all the fields in these. I'm optimistic this can be useful for future xDTD kinds of efforts

@chunyuma
Copy link
Contributor

chunyuma commented Mar 9, 2024

Sounds great! Happy to see this will be useful. One difficult thing I saw using this information is the mapping between this text and the node in KG2, and there are the fewer connections in KG2 between the mapped nodes. But I would be happy to share my experience with Mohsen if needed.

@ecwood
Copy link
Collaborator

ecwood commented Jun 26, 2024

It does look like this information is in the DrugBank XML download:

  <mechanism-of-action>Eliglustat is a glucosylceramide synthase inhibitor used for the treatment of type 1 Gaucher disease.[L41404] Gaucher disease is a rare genetic disorder characterized by the deficiency of acid β-glucosidase, an enzyme that converts glucosylceramide (also known as glucocerebroside) into glucose and ceramide. In patients with Gaucher disease, glucosylceramide is accumulated in the lysosomes of macrophages, leading to the formation of foam cells or Gaucher cells.[L41404] Gaucher cells infiltrate the liver, spleen, bone marrow and other organs, leading to complications such as anemia, thrombocytopenia and hepatosplenomegaly.[L41404,A246384]&#13;
&#13;
Eliglustat reduces the production of glucosylceramide by inhibiting glucosylceramide synthase, a rate-limiting enzyme in the production of glycosphingolipids.[L41404,A182192] This lowers the amount of glucosylceramide that is available in lysosomes, and balances the deficiency of acid β-glucosidase.[L41404,A246384]</mechanism-of-action>

I will have to brainstorm how to ETL it since the information inside of the free text is not linked to any other node IDs. I assume you want this information converted into edges?

@chunyuma
Copy link
Contributor

Hi @ecwood, I agree, to convert this information into edges, we need to first resolve the mapping issue. The second issue we may need to address is extracting the correct relation logic from the free text. I have known that this is not easy because SemMedDB failed in some cases. Perhaps LLM is an option because it is smarter than the algorithm used in SemMedDB.

@dkoslicki
Copy link
Member Author

Unless there is a large appetite to do this, I've done this on the side/locally (to use for Pathfinder training), so at least for my purposes, I have what I need. I'd want to hear from the KG2 team if they think this is worth pursuing otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants