Additional EFO xref context from axioms #19

bfoltyn · 2023-09-27T15:14:39Z

bfoltyn · 2023-09-27T15:24:06Z

@dhimmel For now, I've only added the SPARQL queries:

nxontology_data/efo/queries/mapping_properties.rq - Retrieves the mapping properties from classes
nxontology_data/efo/queries/sources.rq - Retrieves the sources for xrefs.

Could you please review these queries? Let me know if there are any necessary changes.
Once the queries are done, what would be the next steps? How this data should be represented in the output?

dhimmel

Awesome work! I leave you mostly with hard questions and decisions (:

dhimmel · 2023-09-27T17:23:01Z

nxontology_data/efo/queries/mapping_properties.rq

+  )
+  BIND( REPLACE( STR(?mapping_property_uri), "^http://purl\\.obolibrary\\.org/obo/mondo#(.+)$", "mondo:$1" ) AS ?mapping_property_id )
+  BIND( REPLACE( STR(?mapping_property_id), "^http://www\\.w3\\.org/2004/02/skos/core#(.+)$", "skos:$1" ) AS ?mapping_property_id )
+}


Can you reply to this comment with the head of the output table?

efo_id xref_id mapping_property_id efo_uri xref_uri mapping_property_uri

MONDO:0000044 meddra:10060873 mondo:closeMatch http://purl.obolibrary.org/obo/MONDO_0000044 http://identifiers.org/meddra/10060873 http://purl.obolibrary.org/obo/mondo#closeMatch

MONDO:0000050 meddra:10035083 mondo:closeMatch http://purl.obolibrary.org/obo/MONDO_0000050 http://identifiers.org/meddra/10035083 http://purl.obolibrary.org/obo/mondo#closeMatch

MONDO:0000088 meddra:10044701 mondo:closeMatch http://purl.obolibrary.org/obo/MONDO_0000088 http://identifiers.org/meddra/10044701 http://purl.obolibrary.org/obo/mondo#closeMatch

MONDO:0000088 meddra:10058084 mondo:closeMatch http://purl.obolibrary.org/obo/MONDO_0000088 http://identifiers.org/meddra/10058084 http://purl.obolibrary.org/obo/mondo#closeMatch

MONDO:0000127 meddra:10063361 mondo:closeMatch http://purl.obolibrary.org/obo/MONDO_0000127 http://identifiers.org/meddra/10063361 http://purl.obolibrary.org/obo/mondo#closeMatch

dhimmel · 2023-09-27T17:24:13Z

nxontology_data/efo/queries/sources.rq

+  BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
+}
+
+GROUP BY ?efo_id ?xref ?axiom_source


Can you reply to this comment with the head of the output table?

efo_id xref axiom_source

CHEBI:100241 Beilstein:3568352 Beilstein

CHEBI:100241 CAS:85721-33-1 ChemIDplus

CHEBI:100241 CAS:85721-33-1 KEGG COMPOUND

CHEBI:100241 Drug_Central:659 DrugCentral

CHEBI:100241 PMID:10397494 ChEMBL

dhimmel · 2023-09-27T17:24:27Z

nxontology_data/efo/queries/sources.rq

+  BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
+}
+
+GROUP BY ?efo_id ?xref ?axiom_source


Might be nice to add ORDER BY here.

dhimmel · 2023-09-27T17:26:52Z

nxontology_data/efo/queries/sources.rq

+PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
+
+SELECT ?efo_id ?xref  ?axiom_source
+WHERE {


I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by

nxontology-data/nxontology_data/efo/queries/xrefs.rq

Lines 28 to 29 in c7b1429

?source_efo_uri rdf:type owl:Class.

?source_efo_uri oboInOwl:hasDbXref ?xref_raw.

We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.

Questions:

Do all oboInOwl:hasDbXref triples have corresponding axioms?

Do all axioms with owl:annotatedProperty oboInOwl:hasDbXref have corresponding oboInOwl:hasDbXref triples?

I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by

There are cases like that, for example MONDO:0004947 in EFO:0000094 and ICD10:O35 in EFO:0009682 don't have axioms.

We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.

I think that this query should be only for getting xref sources

dhimmel · 2023-09-27T17:32:07Z

nxontology_data/efo/queries/sources.rq

Let's rename to xref_sources.rq

dhimmel · 2023-09-27T17:37:06Z

nxontology_data/efo/queries/mapping_properties.rq

+        IF( STRSTARTS( ?xref_id_dirty, "http://identifiers.org" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)/(.+)$", "$1:$2" ), ?error ),
+        IF( STRSTARTS( ?xref_id_dirty, "http://linkedlifedata.com/resource/umls/id" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "UMLS:$1" ), ?error ),
+        IF( STRSTARTS( ?xref_id_dirty, "http://purl.bioontology.org/ontology/ICD10CM" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10CM:$1" ), ?error ),
+        IF( STRSTARTS( ?xref_id_dirty, "https://icd.who.int/browse10/2019/en#" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10:$1" ), ?error ),
+        IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/entry" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "OMIM:$1" ), ?error ),
+        IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/phenotypicSeries" ), REPLACE( ?xref_id_dirty, "^http.*/PS(.+)$", "OMIMPS:$1" ), ?error ),


All these special cases are a bit annoying to maintain, but great work figuring them out. Was it just an iterative process of figuring out which URIs are not handled?

One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri or curies.

Was it just an iterative process of figuring out which URIs are not handled?

Yes, this was an iterative process

One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri or curies.

Thanks for the tip, this would be better if we can do it like that.

I used curies.get_bioregistry_converter to get the converter for URIs. However, there are some differences how curies maps URIs compared to how I mapped them. Here are some examples:

orphanet:99022 vs obo:orphanet_99022 URI: http://purl.obolibrary.org/obo/Orphanet_99022. Should we replace obo:orphanet_ with orphanet:?

orphanet:98813 vs orphanet.ordo:98813 URI: http://www.orpha.net/ORDO/Orphanet_98813 I guess we can replace orphanet.ordo with orphanet akin to

nxontology-data/nxontology_data/utils.py

Lines 84 to 88 in c7b1429

if collapse_orphanet and prefix.lower() == "orphanet.ordo":

# In EFO, all orphanet.ordo terms existed in orphanet.

# The consistency of using a single prefix will help with mapping.

# https://github.com/biopragmatics/bioregistry/issues/187#issuecomment-1706308305

prefix = "Orphanet"

omimps:203655 vs omim.ps:203655 URI: https://omim.org/phenotypicSeries/PS203655 . Should we replace omim.ps with omimps?

There is also a missing uri_prefix http://purl.bioontology.org/ontology/ICD10CM/ for ICD10CM. The add_prefix method for adding prefixes lacks merge option in the version we use. Would it be safe to update the curies version?

Should we replace obo:orphanet_ with orphanet:?

Yes. It's nice that curies handles that.

For orphanet, we can replace orphanet.ordo after normalization.

omim.ps is the correct normalized prefix.

Ideal is you call normalize_parsed_curie on the output of curies.get_bioregistry_converter so we get consistently formatted CURIES everywhere.

dhimmel · 2023-09-27T17:40:31Z

nxontology_data/efo/queries/mapping_properties.rq

Once the queries are done, what would be the next steps? How this data should be represented in the output?

We can save the tables as output and/or we can include in the nxontology. We'll have to decide how we want to represent this information as node data in networkx. Doing so is tricky because we have to decide to what extent users will want access to rawer forms versus a more consolidated but opinionated format.

@dhimmel How about saving the table in the output for now, so that users can access this data? We could create a follow up issue to discuss how we can represent this information as node data in networkx.

How about saving the table in the output for now

Sounds good.

…apping_properties.rq

bfoltyn · 2023-10-04T12:30:42Z

@dhimmel I've added saving the xref sources and mapping properties tables as output. I've also removed the mapping from URI to CURIE in mapping_properties.rq and used curies instead.

bfoltyn · 2023-10-04T16:29:39Z

@dhimmel I've added the curie normalization using normalize_parsed_curie

dhimmel · 2023-10-05T21:38:11Z

Okay merged and exporting EFO in https://github.com/related-sciences/nxontology-data/actions/runs/6424757602!

Nice work navigating this @bfoltyn

Add mapping properties and sources SPARQL queries

e93019a

dhimmel changed the title ~~Classify xrefs~~ Additional EFO xref context from axioms Sep 27, 2023

dhimmel reviewed Sep 27, 2023

View reviewed changes

Bartek Foltyn added 2 commits October 2, 2023 18:44

rename sources to xref_sources

b80eef8

save xref sources and mapping properties as output, add order by in m…

3d7167e

…apping_properties.rq

bfoltyn requested a review from dhimmel October 4, 2023 12:30

add curie normalization

5cc274a

bfoltyn marked this pull request as ready for review October 4, 2023 16:29

dhimmel approved these changes Oct 5, 2023

View reviewed changes

dhimmel merged commit f6a06d4 into related-sciences:main Oct 5, 2023
1 check passed

bfoltyn deleted the classify-xrefs branch November 14, 2023 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional EFO xref context from axioms #19

Additional EFO xref context from axioms #19

bfoltyn commented Sep 27, 2023

bfoltyn commented Sep 27, 2023

dhimmel left a comment

dhimmel Sep 27, 2023

bfoltyn Oct 2, 2023

dhimmel Sep 27, 2023

bfoltyn Oct 2, 2023

dhimmel Sep 27, 2023

bfoltyn Oct 2, 2023

dhimmel Sep 27, 2023

bfoltyn Oct 2, 2023

dhimmel Sep 27, 2023

bfoltyn Oct 2, 2023

dhimmel Sep 27, 2023

bfoltyn Oct 2, 2023

dhimmel Oct 4, 2023

dhimmel Sep 27, 2023

bfoltyn Oct 4, 2023

dhimmel Oct 4, 2023

bfoltyn commented Oct 4, 2023

bfoltyn commented Oct 4, 2023

dhimmel commented Oct 5, 2023 •

edited

Loading

efo_id	xref_id	mapping_property_id	efo_uri	xref_uri	mapping_property_uri
MONDO:0000044	meddra:10060873	mondo:closeMatch	http://purl.obolibrary.org/obo/MONDO_0000044	http://identifiers.org/meddra/10060873	http://purl.obolibrary.org/obo/mondo#closeMatch
MONDO:0000050	meddra:10035083	mondo:closeMatch	http://purl.obolibrary.org/obo/MONDO_0000050	http://identifiers.org/meddra/10035083	http://purl.obolibrary.org/obo/mondo#closeMatch
MONDO:0000088	meddra:10044701	mondo:closeMatch	http://purl.obolibrary.org/obo/MONDO_0000088	http://identifiers.org/meddra/10044701	http://purl.obolibrary.org/obo/mondo#closeMatch
MONDO:0000088	meddra:10058084	mondo:closeMatch	http://purl.obolibrary.org/obo/MONDO_0000088	http://identifiers.org/meddra/10058084	http://purl.obolibrary.org/obo/mondo#closeMatch
MONDO:0000127	meddra:10063361	mondo:closeMatch	http://purl.obolibrary.org/obo/MONDO_0000127	http://identifiers.org/meddra/10063361	http://purl.obolibrary.org/obo/mondo#closeMatch

efo_id	xref	axiom_source
CHEBI:100241	Beilstein:3568352	Beilstein
CHEBI:100241	CAS:85721-33-1	ChemIDplus
CHEBI:100241	CAS:85721-33-1	KEGG COMPOUND
CHEBI:100241	Drug_Central:659	DrugCentral
CHEBI:100241	PMID:10397494	ChEMBL

	?source_efo_uri rdf:type owl:Class.
	?source_efo_uri oboInOwl:hasDbXref ?xref_raw.

	if collapse_orphanet and prefix.lower() == "orphanet.ordo":
	# In EFO, all orphanet.ordo terms existed in orphanet.
	# The consistency of using a single prefix will help with mapping.
	# https://github.com/biopragmatics/bioregistry/issues/187#issuecomment-1706308305
	prefix = "Orphanet"

Additional EFO xref context from axioms #19

Additional EFO xref context from axioms #19

Conversation

bfoltyn commented Sep 27, 2023

bfoltyn commented Sep 27, 2023

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bfoltyn commented Oct 4, 2023

bfoltyn commented Oct 4, 2023

dhimmel commented Oct 5, 2023 • edited Loading

dhimmel commented Oct 5, 2023 •

edited

Loading