-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EFO cross-references: classify as exact/close when possible #18
Comments
Here's a visualization by @ravwojdyla on why knowing close/exact (or equivalent/related, green/red in visualization) could help refine mappings to be bijective in certain situations like: Also noting how an axiom appears in the EFO OWL source: <owl:Axiom>
<owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000640"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>Orphanet:319298</owl:annotatedTarget>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
</owl:Axiom> |
@dhimmel I managed to recreate the Do you know any examples for which it's more difficult to retrieve axioms? I also noticed that sometimes the axiom has multiple <owl:Axiom>
<owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000479"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>ICD9:238.71</owl:annotatedTarget>
<oboInOwl:source>DOID:2224</oboInOwl:source>
<oboInOwl:source>EFO:0000479</oboInOwl:source>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
<oboInOwl:source>MONDO:i2s</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000479"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>ICD9:238.71</owl:annotatedTarget>
<oboInOwl:source>DOID:2224</oboInOwl:source>
<oboInOwl:source>EFO:0000479</oboInOwl:source>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
<oboInOwl:source>i2s</oboInOwl:source>
</owl:Axiom> It looks like on the website the last Here is a query I used to retrieve the axioms from the owl file: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
SELECT ?efo_id ?xref (MAX(?source) AS ?axiom)
WHERE {
?axiom_element rdf:type owl:Axiom ;
owl:annotatedSource ?annotatedSource ;
owl:annotatedProperty ?annotatedProperty ;
owl:annotatedTarget ?xref ;
oboInOwl:source ?source .
FILTER(?annotatedProperty = oboInOwl:hasDbXref)
BIND( REPLACE( STR(?annotatedSource), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}
GROUP BY ?axiom_element ?efo_id ?annotatedProperty ?xref
And here is also a code snippet that I used in a jupyter notebook to retrieve and compare axioms: Snippet# type: ignore
%load_ext autoreload
%autoreload 2
import jupyter_black
jupyter_black.load()
import pandas as pd
from nxontology_data.efo.efo import EfoProcessor
pd.set_option("display.max_colwidth", None)
efo_processor = EfoProcessor(version="v3.57.0", name="efo_otar_profile")
# efo_processor.download_owl()
rdf = efo_processor.load_rdf()
xrefs = efo_processor.run_query("xrefs", cache=False)
xrefs
axioms = efo_processor.run_query("axioms", cache=False)
axioms
cross_reference_efo_0000479 = {
"MESH:D013920 (Orphanet:3318/e)",
"Orphanet:3318 (MONDO:equivalentTo)",
"EFO:0000479 (MONDO:equivalentTo)",
"Orphanet:71493 (MONDO:relatedTo)",
"ICDO:9962/3 (NCIT:C3407)",
"SCTID:109994006 (MONDO:equivalentTo)",
"UMLS:C0040028 (Orphanet:3318/e)",
"OMIM:614521",
"NCIT:C3407 (exact-label-match)",
"UMLS:C0040028 (Orphanet:3318)",
"OMIM:601977",
"MESH:D013920 (Orphanet:3318)",
"ONCOTREE:ET (MONDO:equivalentTo)",
"NCIT:C3407 (MONDO:exact-label-match)",
"MONDO:0005029",
"ICD10:D47.3 (Orphanet:3318)",
"GARD:0006594 (MONDO:equivalentTo)",
"ICD9:238.71 (i2s)",
"OMIM:187950",
"ICD9:238.71 (MONDO:i2s)",
"COHD:438383 (MONDO:equivalentTo)",
"DOID:2224 (MONDO:equivalentTo)",
"MedDRA:10015493 (Orphanet:3318)",
"MedDRA:10015493 (Orphanet:3318/e)",
}
cross_reference_efo_000640 = {
"UMLS:CN205129 (MONDO:equivalentTo)",
"GARD:0009575 (shared-umls-xref)",
"GARD:0009572 (MONDO:equivalentObsolete)",
"ICD10:C64 (Orphanet:47044)",
"Orphanet:47044 (OMIM:605074)",
"Orphanet:319298 (MONDO:equivalentTo)",
"NCIT:C6975 (MONDO:equivalentTo)",
"GARD:0009572 (MONDO:equivalentTo)",
"OMIM:605074 (Orphanet:47044)",
"GARD:0009575 (MONDO:shared-umls-xref)",
"UMLS:C1306837 (Orphanet:319298)",
"DOID:4465 (MONDO:equivalentTo)",
"ONCOTREE:PRCC (MONDO:equivalentTo)",
"EFO:0000640 (MONDO:equivalentTo)",
"SCTID:733608000 (MONDO:equivalentTo)",
"MONDO:0017884",
"UMLS:C1336078 (MONDO:equivalentTo)",
"UMLS:C1306837 (Orphanet:319298/e)",
}
axioms[axioms["efo_id"] == "EFO:0000479"]
xrefs_with_axiom_efo_479 = (
xrefs[xrefs["efo_id"] == "EFO:0000479"]
.merge(axioms, on=["efo_id", "xref"], how="left")
.sort_values("xref")
.assign(
desc=lambda df: df.apply(
lambda row: row["xref"]
if pd.isnull(row["axiom"])
else f"{row['xref']} ({row['axiom']})",
axis=1,
)
)
)
display(xrefs_with_axiom_efo_479)
display(set(xrefs_with_axiom_efo_479["desc"]) - cross_reference_efo_0000479)
display(cross_reference_efo_0000479 - set(xrefs_with_axiom_efo_479["desc"]))
display(set(xrefs_with_axiom_efo_479["desc"]) == cross_reference_efo_0000479)
xrefs_with_axiom_efo_640 = (
xrefs[xrefs["efo_id"] == "EFO:0000640"]
.merge(axioms, on=["efo_id", "xref"], how="left")
.sort_values("xref")
.assign(
desc=lambda df: df.apply(
lambda row: row["xref"]
if pd.isnull(row["axiom"])
else f"{row['xref']} ({row['axiom']})",
axis=1,
)
)[["efo_id", "xref", "xref_prefix", "xref_accession", "axiom", "desc"]]
)
display(set(xrefs_with_axiom_efo_640["desc"]) == cross_reference_efo_000640)
display(set(xrefs_with_axiom_efo_640["desc"]) - cross_reference_efo_000640)
display(cross_reference_efo_000640 - set(xrefs_with_axiom_efo_640["desc"]))
display(xrefs_with_axiom_efo_640) |
Nice work @bfoltyn. I think we'll want to preserve all sources provided by axioms rather than taking the max. So the output would be keyed on
Hmm, so it appears that an axiom can provide multiple sources for a cross-reference. I am not sure why in the ( |
Absolutely the order has no meaning at all.
This is due to the fact that the cross references have not been normalised.
Is allowed in the OWL data model (which is good in some cases, think of provenance!). We have a special method in mondo called "normalisation" that turns this into:
But this is not at all consistently applied to all ontologies. TLDR: There is no requirement for normalising axiom annotation, so you have to be able to deal with the unnormlised case! |
Thanks @matentzn !
@dhimmel If we want to preserve all sources, I think that we can use the following query: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
SELECT ?efo_id ?xref ?axiom_source
WHERE {
?axiom_element rdf:type owl:Axiom ;
owl:annotatedSource ?source ;
owl:annotatedProperty oboInOwl:hasDbXref ;
owl:annotatedTarget ?xref ;
OPTIONAL { ?axiom_element oboInOwl:source ?axiom_source }
BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}
GROUP BY ?efo_id ?xref ?axiom_source For
About the optional source, there are 46 rows with null Examples
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0007216"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>OMIMPS:142340</owl:annotatedTarget>
<oboInOwl:hasDbXref>MONDO:equivalentTo</oboInOwl:hasDbXref>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0008494"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>MESH:D065632</owl:annotatedTarget>
<skos:closeMatch></skos:closeMatch>
</owl:Axiom> Should we keep the details which predicates are used within axioms? Or is just using the @dhimmel For PREFIX mondo: <http://purl.obolibrary.org/obo/mondo#>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
SELECT ?efo_id ?efo_uri ?predicate_id ?match ?predicate_uri
WHERE {
VALUES ?predicate_uri {mondo:closeMatch mondo:exactMatch}
?efo_uri ?predicate_uri ?match
BIND( REPLACE( STR(?efo_uri), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
BIND( REPLACE( STR(?predicate_uri), "^http://purl.obolibrary.org/obo/mondo#(.+)$", "$1" ) AS ?predicate_id )
} I validated the results against the OLS API and they're correct. Here's a snippet that compares Snippet# type: ignore
%load_ext autoreload
%autoreload 2
import jupyter_black
jupyter_black.load()
import pandas as pd
from nxontology_data.efo.efo import EfoProcessor
efo_processor = EfoProcessor(version="v3.58.0", name="efo_otar_profile")
# efo_processor.download_owl()
rdf = efo_processor.load_rdf()
matches = efo_processor.run_query("matches", cache=False)
matches
import functools
import requests
import urllib.parse
@functools.lru_cache(maxsize=None)
def api_request(efo_uri: str):
encoded = urllib.parse.quote_plus(urllib.parse.quote_plus(efo_uri))
return requests.get(
url=f"https://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/{encoded}"
).json()
api_request.cache_clear()
def get_api_matches(efo_id: str):
res = api_request(efo_id)
return {
"close_match": set(res["annotation"].get("closeMatch", [])),
"exact_match": set(res["annotation"].get("exactMatch", [])),
}
pivot_matches = (
matches.groupby(["efo_id", "efo_uri", "predicate_id"])["match"]
.apply(list)
.reset_index()
.pivot(index=["efo_id", "efo_uri"], columns="predicate_id", values="match")
.reset_index()
.rename(columns={"exactMatch": "exact_match", "closeMatch": "close_match"})
)
pivot_matches
pd.isnull(pivot_matches["close_match"]).value_counts()
pd.isnull(pivot_matches["exact_match"]).value_counts()
sample_matches = pivot_matches.sample(200).fillna("")
sample_matches
def safe_call(x):
try:
return get_api_matches(x)
except Exception as e:
print(f"Error for {x}: {e}")
return {"closeMatch": set(), "exactMatch": set()}
compare_df = (
sample_matches.fillna("")
.assign(
api_close_match=lambda df: df["efo_uri"].apply(
lambda x: safe_call(x)["close_match"]
),
api_exact_match=lambda df: df["efo_uri"].apply(
lambda x: safe_call(x)["exact_match"]
),
exact_match=lambda df: df["exact_match"].apply(set),
close_match=lambda df: df["close_match"].apply(set),
)
.assign(
exact_match_equal=lambda df: df.apply(
lambda row: row["exact_match"] == row["api_exact_match"], axis=1
),
close_match_equal=lambda df: df.apply(
lambda row: row["close_match"] == row["api_close_match"], axis=1
),
extra_exact_match_in_api=lambda df: df.apply(
lambda row: row["api_exact_match"] - row["exact_match"], axis=1
),
extra_exact_match_in_efo=lambda df: df.apply(
lambda row: row["exact_match"] - row["api_exact_match"], axis=1
),
extra_close_match_in_api=lambda df: df.apply(
lambda row: row["api_close_match"] - row["close_match"], axis=1
),
extra_close_match_in_efo=lambda df: df.apply(
lambda row: row["close_match"] - row["api_close_match"], axis=1
),
)
)
compare_df
(compare_df.groupby(["exact_match_equal", "close_match_equal"]).size()) The current results return URLs like @dhimmel Lastly, what format should the axioms, |
That URL is the class URI and we often assign it to a variable with a What we are after is for each
A tabular output from a SPARQL query is the ideal first output here. Not sure if you can fit everything in one query/table or you need multiple. I leave that up to your investigation. To complicate things further (hehe), we should consider whether the python Possibly best to transition to PRs at this point to enable easier review of the SPARQL queries. PR can be draft and incomplete. |
@dhimmel regarding including Option 1:
Example{
"xref_properties": [
{
"xref_id": "orphanet:319298",
"axiom_sources": ["MONDO:equivalentTo"],
"mapping_properties": ["mondo:exactMatch"]
}
]
} Option 2: Separate
Example{
"axiom_sources": [
{ "xref_id": "orphanet:319298", "axiom_source": "MONDO:equivalentTo" }
],
"mapping_properties": [
{ "xref_id": "orphanet:319298", "mapping_source": "mondo:exactMatch" }
]
} Option 3: Second option inside Example{
"xref_properties": {
"axiom_sources": [
{ "xref_id": "orphanet:319298", "axiom_source": "MONDO:equivalentTo" }
],
"mapping_properties": [
{ "xref_id": "orphanet:319298", "mapping_source": "mondo:exactMatch" }
]
}
} Please let me know your thoughts on these options, or if there are any other ideas you have. |
I like option 1. Will there be a slight imprecision where one source have one property and another source could have a conflicting property? For example, an xref being classified as both an exactMatch and closeMatch from different resources? |
Just FYI: what you are trying to do here is much much harder than you think right now - and not necessary. EFO is not a good source for mappings, because it mixes old (ancient) with new (harmonised) xrefs, and makes strange distinctions like "mondo:exactMatch" (which is not even a thing in Mondo). What you should do instead is:
Just my two cents as someone driving by :D |
@dhimmel There are cases where xref is classified as both exactMatch and closeMatch. For example in (
pd.read_json(
"https://github.com/related-sciences/nxontology-data/raw/output/efo/efo_otar_profile_mapping_properties.json.gz"
).pipe(
lambda df: df[
(df["efo_id"] == "EFO:0000095") & (df["xref_id"] == "meddra:10008958")
]
)
)
There are 102 cases like this: All cases(
pd.read_json(
"https://github.com/related-sciences/nxontology-data/raw/output/efo/efo_otar_profile_mapping_properties.json.gz"
)
.groupby(["efo_id", "xref_id"])["mapping_property_id"]
.apply(list)
.reset_index()
.assign(
mapping_property_id=lambda df: df["mapping_property_id"].apply(
lambda x: ",".join(x)
),
has_close_match=lambda df: df["mapping_property_id"].str.contains("closeMatch"),
has_exact_match=lambda df: df["mapping_property_id"].str.contains("exactMatch"),
)
.pipe(lambda df: df[df["has_close_match"] & df["has_exact_match"]])
)
|
Thanks @matentzn for these insights. I'm looking forward to exploring the SSSOM Mondo mappings combined with semra to convert them to EFO-keyed mappings. For now I think it makes sense to continue our current approach, since we're close to having it complete and being evaluable, at least as a good reference for the SSSOM/Mondo alternative.
@bfoltyn I think we could make |
@dhimmel What do you mean by higher priority? I thought we would include all mapping properties as list in the node data, as in option 1 in comment #18 (comment). Are you suggesting we include only one mapping property value |
I think it might be best if we simplify/aggregate the xref metadata that goes into the nxontology node attribute data to something like (written here in YAML for ease): xrefs:
- xref_id: meddra:10008958
xref_uri: http://identifiers.org/meddra/10008958
relation: skos:exactMatch # converting mondo:exactMatch to skos:exactMatch if applicable
sources: [MONDO:equivalentTo, DOID:2224] # haven't cleaned this up yet With this design, an |
@dhimmel Currently |
@dhimmel Should we use the following logic?
|
We could either replace it or create a new field like
That logic sounds good. If there are other interesting values in the otherwise set, we can support those later. |
I think we can add new field. |
Ideally all of them, such that a user only needs |
merges #21 refs #18 Co-authored-by: Bartek Foltyn <[email protected]>
@dhimmel I've noticed that sometimes "xref_details": [
{
"xref_id": "DOID:0070374",
"relation": "skos:exactMatch",
"sources": [
null
]
}, Should we make
Another way would be to filter out nxontology-data/nxontology_data/efo/efo.py Lines 256 to 279 in fb93d9e
|
This is the solution I prefer unless you advocate for a different one. Potentially leave a comment in that query that OPTIONAL will include extra results where |
background in EBISPOT/efo#935
We currently extract database cross-references for EFO using the
oboInOwl:hasDbXref
predicate. However, MONDO is providing xrefs with greater specificity using themondo:exactMatch
andmondo:closeMatch
predicates. Furthermore, there are axioms (withrdf:type owl:Axiom
) that annotateoboInOwl:hasDbXref
instances with values likeMONDO:equivalentTo
.EFO:0000479
is a good example of a class that has all types of xrefs:oboInOwl:hasDbXref
without axiomsoboInOwl:hasDbXref
with axiomsmondo:exactMatch
andmondo:closeMatch
It would be nice to further understand the relation between 2 and 3.
The text was updated successfully, but these errors were encountered: