Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional EFO xref context from axioms #19

Merged
merged 4 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions nxontology_data/efo/queries/mapping_properties.rq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the queries are done, what would be the next steps? How this data should be represented in the output?

We can save the tables as output and/or we can include in the nxontology. We'll have to decide how we want to represent this information as node data in networkx. Doing so is tricky because we have to decide to what extent users will want access to rawer forms versus a more consolidated but opinionated format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel How about saving the table in the output for now, so that users can access this data? We could create a follow up issue to discuss how we can represent this information as node data in networkx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about saving the table in the output for now

Sounds good.

Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
PREFIX mondo: <http://purl.obolibrary.org/obo/mondo#>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?efo_id ?xref_id ?mapping_property_id ?efo_uri ?xref_uri ?mapping_property_uri
WHERE {
VALUES ?mapping_property_uri {mondo:closeMatch mondo:exactMatch skos:mappingRelation skos:closeMatch skos:exactMatch skos:broadMatch skos:narrowMatch skos:relatedMatch}

?efo_uri rdf:type owl:Class .
?efo_uri ?mapping_property_uri ?xref_uri


BIND( REPLACE( STR(?efo_uri), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
BIND( REPLACE( STR(?xref_uri), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?xref_id_dirty )
BIND(
COALESCE(
IF( STRSTARTS( ?xref_id_dirty, "http://identifiers.org" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)/(.+)$", "$1:$2" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "http://linkedlifedata.com/resource/umls/id" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "UMLS:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "http://purl.bioontology.org/ontology/ICD10CM" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10CM:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "https://icd.who.int/browse10/2019/en#" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/entry" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "OMIM:$1" ), ?error ),
IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/phenotypicSeries" ), REPLACE( ?xref_id_dirty, "^http.*/PS(.+)$", "OMIMPS:$1" ), ?error ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these special cases are a bit annoying to maintain, but great work figuring them out. Was it just an iterative process of figuring out which URIs are not handled?

One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri or curies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it just an iterative process of figuring out which URIs are not handled?

Yes, this was an iterative process

One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri or curies.

Thanks for the tip, this would be better if we can do it like that.

I used curies.get_bioregistry_converter to get the converter for URIs. However, there are some differences how curies maps URIs compared to how I mapped them. Here are some examples:

  • orphanet:99022 vs obo:orphanet_99022 URI: http://purl.obolibrary.org/obo/Orphanet_99022. Should we replace obo:orphanet_ with orphanet:?
  • orphanet:98813 vs orphanet.ordo:98813 URI: http://www.orpha.net/ORDO/Orphanet_98813 I guess we can replace orphanet.ordo with orphanet akin to
    if collapse_orphanet and prefix.lower() == "orphanet.ordo":
    # In EFO, all orphanet.ordo terms existed in orphanet.
    # The consistency of using a single prefix will help with mapping.
    # https://github.com/biopragmatics/bioregistry/issues/187#issuecomment-1706308305
    prefix = "Orphanet"
  • omimps:203655 vs omim.ps:203655 URI: https://omim.org/phenotypicSeries/PS203655 . Should we replace omim.ps with omimps?

There is also a missing uri_prefix http://purl.bioontology.org/ontology/ICD10CM/ for ICD10CM. The add_prefix method for adding prefixes lacks merge option in the version we use. Would it be safe to update the curies version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should we replace obo:orphanet_ with orphanet:?

Yes. It's nice that curies handles that.

For orphanet, we can replace orphanet.ordo after normalization.

omim.ps is the correct normalized prefix.

Ideal is you call normalize_parsed_curie on the output of curies.get_bioregistry_converter so we get consistently formatted CURIES everywhere.

?xref_id_dirty
) AS ?xref_id
)
BIND( REPLACE( STR(?mapping_property_uri), "^http://purl\\.obolibrary\\.org/obo/mondo#(.+)$", "mondo:$1" ) AS ?mapping_property_id )
BIND( REPLACE( STR(?mapping_property_id), "^http://www\\.w3\\.org/2004/02/skos/core#(.+)$", "skos:$1" ) AS ?mapping_property_id )
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reply to this comment with the head of the output table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

17 changes: 17 additions & 0 deletions nxontology_data/efo/queries/sources.rq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename to xref_sources.rq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref ?axiom_source
WHERE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by

?source_efo_uri rdf:type owl:Class.
?source_efo_uri oboInOwl:hasDbXref ?xref_raw.

We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.

Questions:

  1. Do all oboInOwl:hasDbXref triples have corresponding axioms?
  2. Do all axioms with owl:annotatedProperty oboInOwl:hasDbXref have corresponding oboInOwl:hasDbXref triples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by

There are cases like that, for example MONDO:0004947 in EFO:0000094 and ICD10:O35 in EFO:0009682 don't have axioms.

We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.

I think that this query should be only for getting xref sources

?axiom rdf:type owl:Axiom.
?axiom owl:annotatedSource ?source.
?axiom owl:annotatedProperty oboInOwl:hasDbXref.
?axiom owl:annotatedTarget ?xref.

OPTIONAL { ?axiom oboInOwl:source ?axiom_source }.

BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}

GROUP BY ?efo_id ?xref ?axiom_source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reply to this comment with the head of the output table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efo_id xref axiom_source
CHEBI:100241 Beilstein:3568352 Beilstein
CHEBI:100241 CAS:85721-33-1 ChemIDplus
CHEBI:100241 CAS:85721-33-1 KEGG COMPOUND
CHEBI:100241 Drug_Central:659 DrugCentral
CHEBI:100241 PMID:10397494 ChEMBL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to add ORDER BY here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok