Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional EFO xref context from axioms #19

Merged
merged 4 commits into from
Oct 5, 2023

Conversation

bfoltyn
Copy link
Contributor

@bfoltyn bfoltyn commented Sep 27, 2023

#18

@bfoltyn
Copy link
Contributor Author

bfoltyn commented Sep 27, 2023

@dhimmel For now, I've only added the SPARQL queries:

  • nxontology_data/efo/queries/mapping_properties.rq - Retrieves the mapping properties from classes
  • nxontology_data/efo/queries/sources.rq - Retrieves the sources for xrefs.

Could you please review these queries? Let me know if there are any necessary changes.
Once the queries are done, what would be the next steps? How this data should be represented in the output?

@dhimmel dhimmel changed the title Classify xrefs Additional EFO xref context from axioms Sep 27, 2023
Copy link
Member

@dhimmel dhimmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I leave you mostly with hard questions and decisions (:

)
BIND( REPLACE( STR(?mapping_property_uri), "^http://purl\\.obolibrary\\.org/obo/mondo#(.+)$", "mondo:$1" ) AS ?mapping_property_id )
BIND( REPLACE( STR(?mapping_property_id), "^http://www\\.w3\\.org/2004/02/skos/core#(.+)$", "skos:$1" ) AS ?mapping_property_id )
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reply to this comment with the head of the output table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}

GROUP BY ?efo_id ?xref ?axiom_source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reply to this comment with the head of the output table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efo_id xref axiom_source
CHEBI:100241 Beilstein:3568352 Beilstein
CHEBI:100241 CAS:85721-33-1 ChemIDplus
CHEBI:100241 CAS:85721-33-1 KEGG COMPOUND
CHEBI:100241 Drug_Central:659 DrugCentral
CHEBI:100241 PMID:10397494 ChEMBL

BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}

GROUP BY ?efo_id ?xref ?axiom_source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to add ORDER BY here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref ?axiom_source
WHERE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by

?source_efo_uri rdf:type owl:Class.
?source_efo_uri oboInOwl:hasDbXref ?xref_raw.

We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.

Questions:

  1. Do all oboInOwl:hasDbXref triples have corresponding axioms?
  2. Do all axioms with owl:annotatedProperty oboInOwl:hasDbXref have corresponding oboInOwl:hasDbXref triples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by

There are cases like that, for example MONDO:0004947 in EFO:0000094 and ICD10:O35 in EFO:0009682 don't have axioms.

We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.

I think that this query should be only for getting xref sources

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename to xref_sources.rq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Comment on lines 17 to 22
IF( STRSTARTS( ?xref_id_dirty, "http://identifiers.org" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)/(.+)$", "$1:$2" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "http://linkedlifedata.com/resource/umls/id" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "UMLS:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "http://purl.bioontology.org/ontology/ICD10CM" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10CM:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "https://icd.who.int/browse10/2019/en#" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/entry" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "OMIM:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/phenotypicSeries" ), REPLACE( ?xref_id_dirty, "^http.*/PS(.+)$", "OMIMPS:$1" ), ?error ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these special cases are a bit annoying to maintain, but great work figuring them out. Was it just an iterative process of figuring out which URIs are not handled?

One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri or curies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it just an iterative process of figuring out which URIs are not handled?

Yes, this was an iterative process

One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri or curies.

Thanks for the tip, this would be better if we can do it like that.

I used curies.get_bioregistry_converter to get the converter for URIs. However, there are some differences how curies maps URIs compared to how I mapped them. Here are some examples:

  • orphanet:99022 vs obo:orphanet_99022 URI: http://purl.obolibrary.org/obo/Orphanet_99022. Should we replace obo:orphanet_ with orphanet:?
  • orphanet:98813 vs orphanet.ordo:98813 URI: http://www.orpha.net/ORDO/Orphanet_98813 I guess we can replace orphanet.ordo with orphanet akin to
    if collapse_orphanet and prefix.lower() == "orphanet.ordo":
    # In EFO, all orphanet.ordo terms existed in orphanet.
    # The consistency of using a single prefix will help with mapping.
    # https://github.com/biopragmatics/bioregistry/issues/187#issuecomment-1706308305
    prefix = "Orphanet"
  • omimps:203655 vs omim.ps:203655 URI: https://omim.org/phenotypicSeries/PS203655 . Should we replace omim.ps with omimps?

There is also a missing uri_prefix http://purl.bioontology.org/ontology/ICD10CM/ for ICD10CM. The add_prefix method for adding prefixes lacks merge option in the version we use. Would it be safe to update the curies version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should we replace obo:orphanet_ with orphanet:?

Yes. It's nice that curies handles that.

For orphanet, we can replace orphanet.ordo after normalization.

omim.ps is the correct normalized prefix.

Ideal is you call normalize_parsed_curie on the output of curies.get_bioregistry_converter so we get consistently formatted CURIES everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the queries are done, what would be the next steps? How this data should be represented in the output?

We can save the tables as output and/or we can include in the nxontology. We'll have to decide how we want to represent this information as node data in networkx. Doing so is tricky because we have to decide to what extent users will want access to rawer forms versus a more consolidated but opinionated format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel How about saving the table in the output for now, so that users can access this data? We could create a follow up issue to discuss how we can represent this information as node data in networkx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about saving the table in the output for now

Sounds good.

@bfoltyn
Copy link
Contributor Author

bfoltyn commented Oct 4, 2023

@dhimmel I've added saving the xref sources and mapping properties tables as output. I've also removed the mapping from URI to CURIE in mapping_properties.rq and used curies instead.

@bfoltyn bfoltyn requested a review from dhimmel October 4, 2023 12:30
@bfoltyn
Copy link
Contributor Author

bfoltyn commented Oct 4, 2023

@dhimmel I've added the curie normalization using normalize_parsed_curie

@bfoltyn bfoltyn marked this pull request as ready for review October 4, 2023 16:29
@dhimmel dhimmel merged commit f6a06d4 into related-sciences:main Oct 5, 2023
1 check passed
@dhimmel
Copy link
Member

dhimmel commented Oct 5, 2023

Okay merged and exporting EFO in https://github.com/related-sciences/nxontology-data/actions/runs/6424757602!

Nice work navigating this @bfoltyn

@bfoltyn bfoltyn deleted the classify-xrefs branch November 14, 2023 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants