EFO cross-references: classify as exact/close when possible #18

dhimmel · 2023-09-19T14:12:05Z

We currently extract database cross-references for EFO using the oboInOwl:hasDbXref predicate. However, MONDO is providing xrefs with greater specificity using the mondo:exactMatch and mondo:closeMatch predicates. Furthermore, there are axioms (with rdf:type owl:Axiom) that annotate oboInOwl:hasDbXref instances with values like MONDO:equivalentTo.

EFO:0000479 is a good example of a class that has all types of xrefs:

oboInOwl:hasDbXref without axioms
oboInOwl:hasDbXref with axioms
mondo:exactMatch and mondo:closeMatch

It would be nice to further understand the relation between 2 and 3.

The text was updated successfully, but these errors were encountered:

dhimmel · 2023-09-19T14:15:28Z

Here's a visualization by @ravwojdyla on why knowing close/exact (or equivalent/related, green/red in visualization) could help refine mappings to be bijective in certain situations like:

Also noting how an axiom appears in the EFO OWL source:

<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000640"/>
    <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
    <owl:annotatedTarget>Orphanet:319298</owl:annotatedTarget>
    <oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
</owl:Axiom>

bfoltyn · 2023-09-21T15:02:16Z

@dhimmel I managed to recreate the database cross reference section that appears on the website by using axioms from the .owl file for EFO:0000479 and EFO:0000640. However, I noticed that for EFO:0000640, there are two extra xrefs MeSH:C538614 and UMLS:C2931899 , that are not displayed on the website, but are present in the xrefs query.

Do you know any examples for which it's more difficult to retrieve axioms?

I also noticed that sometimes the axiom has multiple oboInOwl:source values and sometimes a single cross referance has multiple axioms. For example for ICD9:238.71 in EFO:0000479

<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000479"/>
    <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
    <owl:annotatedTarget>ICD9:238.71</owl:annotatedTarget>
    <oboInOwl:source>DOID:2224</oboInOwl:source>
    <oboInOwl:source>EFO:0000479</oboInOwl:source>
    <oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
    <oboInOwl:source>MONDO:i2s</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000479"/>
    <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
    <owl:annotatedTarget>ICD9:238.71</owl:annotatedTarget>
    <oboInOwl:source>DOID:2224</oboInOwl:source>
    <oboInOwl:source>EFO:0000479</oboInOwl:source>
    <oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
    <oboInOwl:source>i2s</oboInOwl:source>
</owl:Axiom>

It looks like on the website the last source value used is to describe the cross reference. The ordering of these sources seems to be alphabetical, though. I'm not sure what approach we should use if there is more than one source. Do you have any suggestions?

Here is a query I used to retrieve the axioms from the owl file:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref (MAX(?source) AS ?axiom)
WHERE {
  ?axiom_element rdf:type owl:Axiom ;
         owl:annotatedSource ?annotatedSource ;
         owl:annotatedProperty ?annotatedProperty ;
         owl:annotatedTarget ?xref ;
         oboInOwl:source ?source .

  FILTER(?annotatedProperty = oboInOwl:hasDbXref)

  BIND( REPLACE( STR(?annotatedSource), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}
GROUP BY ?axiom_element ?efo_id ?annotatedProperty ?xref

And here is also a code snippet that I used in a jupyter notebook to retrieve and compare axioms:

Snippet

# type: ignore
%load_ext autoreload
%autoreload 2


import jupyter_black

jupyter_black.load()
import pandas as pd
from nxontology_data.efo.efo import EfoProcessor

pd.set_option("display.max_colwidth", None)
efo_processor = EfoProcessor(version="v3.57.0", name="efo_otar_profile")
# efo_processor.download_owl()

rdf = efo_processor.load_rdf()
xrefs = efo_processor.run_query("xrefs", cache=False)

xrefs
axioms = efo_processor.run_query("axioms", cache=False)

axioms
cross_reference_efo_0000479 = {
    "MESH:D013920 (Orphanet:3318/e)",
    "Orphanet:3318 (MONDO:equivalentTo)",
    "EFO:0000479 (MONDO:equivalentTo)",
    "Orphanet:71493 (MONDO:relatedTo)",
    "ICDO:9962/3 (NCIT:C3407)",
    "SCTID:109994006 (MONDO:equivalentTo)",
    "UMLS:C0040028 (Orphanet:3318/e)",
    "OMIM:614521",
    "NCIT:C3407 (exact-label-match)",
    "UMLS:C0040028 (Orphanet:3318)",
    "OMIM:601977",
    "MESH:D013920 (Orphanet:3318)",
    "ONCOTREE:ET (MONDO:equivalentTo)",
    "NCIT:C3407 (MONDO:exact-label-match)",
    "MONDO:0005029",
    "ICD10:D47.3 (Orphanet:3318)",
    "GARD:0006594 (MONDO:equivalentTo)",
    "ICD9:238.71 (i2s)",
    "OMIM:187950",
    "ICD9:238.71 (MONDO:i2s)",
    "COHD:438383 (MONDO:equivalentTo)",
    "DOID:2224 (MONDO:equivalentTo)",
    "MedDRA:10015493 (Orphanet:3318)",
    "MedDRA:10015493 (Orphanet:3318/e)",
}

cross_reference_efo_000640 = {
    "UMLS:CN205129 (MONDO:equivalentTo)",
    "GARD:0009575 (shared-umls-xref)",
    "GARD:0009572 (MONDO:equivalentObsolete)",
    "ICD10:C64 (Orphanet:47044)",
    "Orphanet:47044 (OMIM:605074)",
    "Orphanet:319298 (MONDO:equivalentTo)",
    "NCIT:C6975 (MONDO:equivalentTo)",
    "GARD:0009572 (MONDO:equivalentTo)",
    "OMIM:605074 (Orphanet:47044)",
    "GARD:0009575 (MONDO:shared-umls-xref)",
    "UMLS:C1306837 (Orphanet:319298)",
    "DOID:4465 (MONDO:equivalentTo)",
    "ONCOTREE:PRCC (MONDO:equivalentTo)",
    "EFO:0000640 (MONDO:equivalentTo)",
    "SCTID:733608000 (MONDO:equivalentTo)",
    "MONDO:0017884",
    "UMLS:C1336078 (MONDO:equivalentTo)",
    "UMLS:C1306837 (Orphanet:319298/e)",
}


axioms[axioms["efo_id"] == "EFO:0000479"]
xrefs_with_axiom_efo_479 = (
    xrefs[xrefs["efo_id"] == "EFO:0000479"]
    .merge(axioms, on=["efo_id", "xref"], how="left")
    .sort_values("xref")
    .assign(
        desc=lambda df: df.apply(
            lambda row: row["xref"]
            if pd.isnull(row["axiom"])
            else f"{row['xref']} ({row['axiom']})",
            axis=1,
        )
    )
)

display(xrefs_with_axiom_efo_479)

display(set(xrefs_with_axiom_efo_479["desc"]) - cross_reference_efo_0000479)
display(cross_reference_efo_0000479 - set(xrefs_with_axiom_efo_479["desc"]))
display(set(xrefs_with_axiom_efo_479["desc"]) == cross_reference_efo_0000479)
xrefs_with_axiom_efo_640 = (
    xrefs[xrefs["efo_id"] == "EFO:0000640"]
    .merge(axioms, on=["efo_id", "xref"], how="left")
    .sort_values("xref")
    .assign(
        desc=lambda df: df.apply(
            lambda row: row["xref"]
            if pd.isnull(row["axiom"])
            else f"{row['xref']} ({row['axiom']})",
            axis=1,
        )
    )[["efo_id", "xref", "xref_prefix", "xref_accession", "axiom", "desc"]]
)

display(set(xrefs_with_axiom_efo_640["desc"]) == cross_reference_efo_000640)
display(set(xrefs_with_axiom_efo_640["desc"]) - cross_reference_efo_000640)
display(cross_reference_efo_000640 - set(xrefs_with_axiom_efo_640["desc"]))

display(xrefs_with_axiom_efo_640)

dhimmel · 2023-09-25T15:49:42Z

Nice work @bfoltyn.

I think we'll want to preserve all sources provided by axioms rather than taking the max. So the output would be keyed on ?efo_id ?xref ?axiom_source. Could also consider making ?axiom_source optional such that we still match xrefs without axioms.

sometimes a single cross reference has multiple axioms

Hmm, so it appears that an axiom can provide multiple sources for a cross-reference. I am not sure why in the (EFO:0000479 subject, ICD9:238.71 object, oboInOwl:hasDbXref predicate) triplet has multiple axioms, which duplicate all but one sources across them. Perhaps @zoependlington or @matentzn would know?

matentzn · 2023-09-26T13:24:11Z

I think we'll want to preserve all sources provided by axioms rather than taking the max.

Absolutely the order has no meaning at all.

sometimes a single cross-reference has multiple axioms

This is due to the fact that the cross references have not been normalised.

:a :hasDbXref :b {source: "X"}
:a :hasDbXref :b {source: "Y"}

Is allowed in the OWL data model (which is good in some cases, think of provenance!).

We have a special method in mondo called "normalisation" that turns this into:

:a :hasDbXref :b {source: "X", source: "Y"}

But this is not at all consistently applied to all ontologies.

TLDR: There is no requirement for normalising axiom annotation, so you have to be able to deal with the unnormlised case!

bfoltyn · 2023-09-26T14:27:06Z

TLDR: There is no requirement for normalising axiom annotation, so you have to be able to deal with the unnormlised case!

Thanks @matentzn !

I think we'll want to preserve all sources provided by axioms rather than taking the max. So the output would be keyed on ?efo_id ?xref ?axiom_source. Could also consider making ?axiom_source optional such that we still match xrefs without axioms.

@dhimmel If we want to preserve all sources, I think that we can use the following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref  ?axiom_source
WHERE {
  ?axiom_element rdf:type owl:Axiom ;
  owl:annotatedSource ?source ;
  owl:annotatedProperty oboInOwl:hasDbXref ;
  owl:annotatedTarget ?xref ;

  OPTIONAL { ?axiom_element oboInOwl:source ?axiom_source }

  BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}

GROUP BY ?efo_id ?xref ?axiom_source

For efo_id=EFO:0000479 and xref=ICD9:238.71, the results are:

efo_id	xref	axiom_source
EFO:0000479	ICD9:238.71	DOID:2224
EFO:0000479	ICD9:238.71	EFO:0000479
EFO:0000479	ICD9:238.71	MONDO:equivalentTo
EFO:0000479	ICD9:238.71	MONDO:i2s
EFO:0000479	ICD9:238.71	i2s

About the optional source, there are 46 rows with null axiom_source compared to 117675 where it has a value. Sometimes axioms can have attributes other than oboInOwl:source like oboInOwl:hasDbXref or skos:closeMatch

Examples

this Axiom has oboInOwl:hasDbXref with MONDO:equivalentTo value

<owl:Axiom>
        <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0007216"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget>OMIMPS:142340</owl:annotatedTarget>
        <oboInOwl:hasDbXref>MONDO:equivalentTo</oboInOwl:hasDbXref>
</owl:Axiom>

this Axiom has skos:closeMatch without value:

<owl:Axiom>
        <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0008494"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget>MESH:D065632</owl:annotatedTarget>
        <skos:closeMatch></skos:closeMatch>
</owl:Axiom>

Should we keep the details which predicates are used within axioms? Or is just using the oboInOwl:source ok?

@dhimmel For mondo:exactMatch and mondo:closeMatch, slightly modified version of the query, you used in in EBISPOT/efo#935 should work:

PREFIX mondo: <http://purl.obolibrary.org/obo/mondo#>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
SELECT ?efo_id ?efo_uri ?predicate_id ?match ?predicate_uri
WHERE {
  VALUES ?predicate_uri {mondo:closeMatch mondo:exactMatch}
  ?efo_uri ?predicate_uri ?match


  BIND( REPLACE( STR(?efo_uri), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
  BIND( REPLACE( STR(?predicate_uri), "^http://purl.obolibrary.org/obo/mondo#(.+)$", "$1" ) AS ?predicate_id )
}

I validated the results against the OLS API and they're correct. Here's a snippet that compares mondo:exactMatch and mondo:closeMatch:

Snippet

# type: ignore
%load_ext autoreload
%autoreload 2


import jupyter_black

jupyter_black.load()
import pandas as pd
from nxontology_data.efo.efo import EfoProcessor
efo_processor = EfoProcessor(version="v3.58.0", name="efo_otar_profile")
# efo_processor.download_owl()

rdf = efo_processor.load_rdf()
matches = efo_processor.run_query("matches", cache=False)

matches
import functools
import requests
import urllib.parse


@functools.lru_cache(maxsize=None)
def api_request(efo_uri: str):
    encoded = urllib.parse.quote_plus(urllib.parse.quote_plus(efo_uri))
    return requests.get(
        url=f"https://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/{encoded}"
    ).json()


api_request.cache_clear()
def get_api_matches(efo_id: str):
    res = api_request(efo_id)
    return {
        "close_match": set(res["annotation"].get("closeMatch", [])),
        "exact_match": set(res["annotation"].get("exactMatch", [])),
    }
pivot_matches = (
    matches.groupby(["efo_id", "efo_uri", "predicate_id"])["match"]
    .apply(list)
    .reset_index()
    .pivot(index=["efo_id", "efo_uri"], columns="predicate_id", values="match")
    .reset_index()
    .rename(columns={"exactMatch": "exact_match", "closeMatch": "close_match"})
)

pivot_matches
pd.isnull(pivot_matches["close_match"]).value_counts()
pd.isnull(pivot_matches["exact_match"]).value_counts()
sample_matches = pivot_matches.sample(200).fillna("")

sample_matches
def safe_call(x):
    try:
        return get_api_matches(x)
    except Exception as e:
        print(f"Error for {x}: {e}")
        return {"closeMatch": set(), "exactMatch": set()}
compare_df = (
    sample_matches.fillna("")
    .assign(
        api_close_match=lambda df: df["efo_uri"].apply(
            lambda x: safe_call(x)["close_match"]
        ),
        api_exact_match=lambda df: df["efo_uri"].apply(
            lambda x: safe_call(x)["exact_match"]
        ),
        exact_match=lambda df: df["exact_match"].apply(set),
        close_match=lambda df: df["close_match"].apply(set),
    )
    .assign(
        exact_match_equal=lambda df: df.apply(
            lambda row: row["exact_match"] == row["api_exact_match"], axis=1
        ),
        close_match_equal=lambda df: df.apply(
            lambda row: row["close_match"] == row["api_close_match"], axis=1
        ),
        extra_exact_match_in_api=lambda df: df.apply(
            lambda row: row["api_exact_match"] - row["exact_match"], axis=1
        ),
        extra_exact_match_in_efo=lambda df: df.apply(
            lambda row: row["exact_match"] - row["api_exact_match"], axis=1
        ),
        extra_close_match_in_api=lambda df: df.apply(
            lambda row: row["api_close_match"] - row["close_match"], axis=1
        ),
        extra_close_match_in_efo=lambda df: df.apply(
            lambda row: row["close_match"] - row["api_close_match"], axis=1
        ),
    )
)

compare_df
(compare_df.groupby(["exact_match_equal", "close_match_equal"]).size())

The current results return URLs like http://purl.obolibrary.org/obo/Orphanet_98576. Should we keep this format or transform them?

@dhimmel Lastly, what format should the axioms, mondo:exactMatch and mondo:closeMatch have in the output json file?

dhimmel · 2023-09-26T17:18:36Z

The current results return URLs like http://purl.obolibrary.org/obo/Orphanet_98576. Should we keep this format or transform them?

That URL is the class URI and we often assign it to a variable with a _uri suffix. The corresponding CURIE (compact URI) version is Orphanet:98576 and we often use an _id suffix for this. The SPARQL query can include both the URI and CURIES as separate output fields.

What we are after is for each oboInOwl:hasDbXref:

what are all the sources providing that xref
what mapping property applies to the xref, e.g. exactMatch or closeMatch

A tabular output from a SPARQL query is the ideal first output here. Not sure if you can fit everything in one query/table or you need multiple. I leave that up to your investigation.

To complicate things further (hehe), we should consider whether the python oaklib, which can extract mappings to the SSSOM format is a better approach here than writing our own SPARQL queries. SSSOM stands for Simple Standard for Sharing Ontological Mappings (publication).

Possibly best to transition to PRs at this point to enable easier review of the SPARQL queries. PR can be draft and incomplete.

bfoltyn · 2023-10-13T14:25:49Z

@dhimmel regarding including xref_sources and mapping_properties in node data, I have a couple of ideas:

Option 1: xref_properties field with a list with the following schema:

xref_id: str
sources: list[str]
mapping_properties: list[str]

Example

{
  "xref_properties": [
    {
      "xref_id": "orphanet:319298",
      "axiom_sources": ["MONDO:equivalentTo"],
      "mapping_properties": ["mondo:exactMatch"]
    }
  ]
}

Option 2: Separate xref_sources and mapping_properties

xref_sources schema:

xref_id: str
axiom_source: str

mapping_properties schema:

xref_id: str
axiom_source: str

Example

{
  "axiom_sources": [
    { "xref_id": "orphanet:319298", "axiom_source": "MONDO:equivalentTo" }
  ],
  "mapping_properties": [
    { "xref_id": "orphanet:319298", "mapping_source": "mondo:exactMatch" }
  ]
}

Option 3: Second option inside xref_properties field:

Example

{
  "xref_properties": {
    "axiom_sources": [
      { "xref_id": "orphanet:319298", "axiom_source": "MONDO:equivalentTo" }
    ],
    "mapping_properties": [
      { "xref_id": "orphanet:319298", "mapping_source": "mondo:exactMatch" }
    ]
  }
}

Please let me know your thoughts on these options, or if there are any other ideas you have.

dhimmel · 2023-10-13T14:31:03Z

I like option 1. Will there be a slight imprecision where one source have one property and another source could have a conflicting property? For example, an xref being classified as both an exactMatch and closeMatch from different resources?

matentzn · 2023-10-13T15:50:37Z

Just FYI: what you are trying to do here is much much harder than you think right now - and not necessary.

EFO is not a good source for mappings, because it mixes old (ancient) with new (harmonised) xrefs, and makes strange distinctions like "mondo:exactMatch" (which is not even a thing in Mondo). What you should do instead is:

ETL the primary SSSOM file for mondo mappings: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo.sssom.tsv
use semra (see also https://github.com/biopragmatics/semra/blob/main/notebooks/umls-inference-analysis.ipynb), cc @cthoyt to chain the mappings together in a way to get the correct EFO to X mappings
export the mappings to sssom and feed that into your system

Just my two cents as someone driving by :D

bfoltyn · 2023-11-07T14:10:00Z

I like option 1. Will there be a slight imprecision where one source have one property and another source could have a conflicting property? For example, an xref being classified as both an exactMatch and closeMatch from different resources?

@dhimmel There are cases where xref is classified as both exactMatch and closeMatch. For example in EFO:0000095 xref meddra:10008958 has mondo:closeMatch and skos:exactMatch

(
    pd.read_json(
        "https://github.com/related-sciences/nxontology-data/raw/output/efo/efo_otar_profile_mapping_properties.json.gz"
    ).pipe(
        lambda df: df[
            (df["efo_id"] == "EFO:0000095") & (df["xref_id"] == "meddra:10008958")
        ]
    )
)

efo_id	xref_id	mapping_property_id	efo_uri	xref_uri	mapping_property_uri
EFO:0000095	meddra:10008958	mondo:closeMatch	http://www.ebi.ac.uk/efo/EFO_0000095	http://identifiers.org/meddra/10008958	http://purl.obolibrary.org/obo/mondo#closeMatch
EFO:0000095	meddra:10008958	skos:exactMatch	http://www.ebi.ac.uk/efo/EFO_0000095	http://identifiers.org/meddra/10008958	http://www.w3.org/2004/02/skos/core#exactMatch

There are 102 cases like this:

All cases

(
    pd.read_json(
        "https://github.com/related-sciences/nxontology-data/raw/output/efo/efo_otar_profile_mapping_properties.json.gz"
    )
    .groupby(["efo_id", "xref_id"])["mapping_property_id"]
    .apply(list)
    .reset_index()
    .assign(
        mapping_property_id=lambda df: df["mapping_property_id"].apply(
            lambda x: ",".join(x)
        ),
        has_close_match=lambda df: df["mapping_property_id"].str.contains("closeMatch"),
        has_exact_match=lambda df: df["mapping_property_id"].str.contains("exactMatch"),
    )
    .pipe(lambda df: df[df["has_close_match"] & df["has_exact_match"]])
)

	efo_id	xref_id	mapping_property_id	has_close_match
EFO:0000095	meddra:10008958	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000174	meddra:10015560	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000181	meddra:10060121	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000182	meddra:10049010	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000191	meddra:10060707	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000198	meddra:10028532	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000221	meddra:10000871	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000221	meddra:10059439	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000222	meddra:10000880	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000223	meddra:10000890	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000224	meddra:10001019	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000248	meddra:10065867	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000255	meddra:10002449	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000309	meddra:10006595	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000309	meddra:10053518	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000309	meddra:10067184	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000333	meddra:10008734	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000339	meddra:10009013	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000403	meddra:10012818	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000437	meddra:10065868	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000479	meddra:10015493	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000500	meddra:10017709	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000502	meddra:10017708	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000502	mesh:D018305	mondo:exactMatch,skos:closeMatch	True	True
EFO:0000519	meddra:10018336	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000519	meddra:10018337	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000558	meddra:10023284	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000564	meddra:10024189	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000569	meddra:10024627	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000574	meddra:10025310	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000630	meddra:10027744	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000632	meddra:10030286	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000681	meddra:10067946	mondo:closeMatch,skos:exactMatch	True	True
EFO:0000762	meddra:10019827	mondo:closeMatch,skos:exactMatch	True	True
EFO:0001376	meddra:10042863	mondo:closeMatch,skos:exactMatch	True	True
EFO:0001378	meddra:10028228	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002087	meddra:10016632	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002429	meddra:10036057	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002499	meddra:10002224	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002499	meddra:10060971	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002501	meddra:10026659	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002892	meddra:10007476	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002914	meddra:10039497	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002916	meddra:10030155	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002918	meddra:10039022	mondo:closeMatch,skos:exactMatch	True	True
EFO:0002939	meddra:10027107	mondo:closeMatch,skos:exactMatch	True	True
EFO:0003094	meddra:10017701	mondo:closeMatch,skos:exactMatch	True	True
EFO:0003802	meddra:10038269	mondo:closeMatch,skos:exactMatch	True	True
EFO:0003811	meddra:10038270	mondo:closeMatch,skos:exactMatch	True	True
EFO:0003968	meddra:10002476	mondo:closeMatch,skos:exactMatch	True	True
EFO:0005221	meddra:10004593	mondo:closeMatch,skos:exactMatch	True	True
EFO:0005221	meddra:10008593	mondo:closeMatch,skos:exactMatch	True	True
EFO:0005287	meddra:10038804	mondo:closeMatch,skos:exactMatch	True	True
EFO:0005567	meddra:10056558	mondo:closeMatch,skos:exactMatch	True	True
EFO:0005952	meddra:10029547	mondo:closeMatch,skos:exactMatch	True	True
EFO:0006460	meddra:10051938	mondo:closeMatch,skos:exactMatch	True	True
EFO:0006738	meddra:10035484	mondo:closeMatch,skos:exactMatch	True	True
EFO:0007143	meddra:10001882	mondo:closeMatch,skos:exactMatch	True	True
EFO:0007252	meddra:10048251	mondo:closeMatch,skos:exactMatch	True	True
EFO:0007359	meddra:10056450	mondo:closeMatch,skos:exactMatch	True	True
EFO:0007549	meddra:10017852	mondo:closeMatch,skos:exactMatch	True	True
EFO:0009001	meddra:10026891	mondo:closeMatch,skos:exactMatch	True	True
EFO:0009441	meddra:10047801	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000028	meddra:10014967	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000079	meddra:10048853	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000157	meddra:10036685	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000209	meddra:10011318	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000249	meddra:10033366	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000249	meddra:10068223	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000292	meddra:10062001	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000309	meddra:10023249	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000314	meddra:10064886	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000318	meddra:10069698	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000334	meddra:10049459	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000355	meddra:10027406	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000396	meddra:10051713	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000475	meddra:10050487	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000476	meddra:10035059	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000478	meddra:10035079	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000491	meddra:10065857	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000496	meddra:10036832	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000550	meddra:10062113	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000558	meddra:10042926	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000560	meddra:10042985	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000576	meddra:10061031	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000581	meddra:10043670	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000595	meddra:10002240	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000616	meddra:10061252	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000785	meddra:10040493	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000796	meddra:10001388	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000828	meddra:10067399	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000895	DOID:6785	mondo:exactMatch,skos:closeMatch	True	True
EFO:1000895	meddra:10064581	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000919	meddra:10057649	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000926	meddra:10060801	mondo:closeMatch,skos:exactMatch	True	True
EFO:1000956	meddra:10019053	mondo:closeMatch,skos:exactMatch	True	True
EFO:1001187	meddra:10041329	mondo:closeMatch,skos:exactMatch	True	True
EFO:1001229	meddra:10046752	mondo:closeMatch,skos:exactMatch	True	True
EFO:1001465	meddra:10018340	mondo:closeMatch,skos:exactMatch	True	True
EFO:1001469	meddra:10061275	mondo:closeMatch,skos:exactMatch	True	True
EFO:1001779	meddra:10009018	mondo:closeMatch,skos:exactMatch	True	True
EFO:1001972	meddra:10025552	mondo:closeMatch,skos:exactMatch	True	True

dhimmel · 2023-11-07T20:55:28Z

what you are trying to do here is much much harder than you think right now - and not necessary

Thanks @matentzn for these insights. I'm looking forward to exploring the SSSOM Mondo mappings combined with semra to convert them to EFO-keyed mappings. For now I think it makes sense to continue our current approach, since we're close to having it complete and being evaluable, at least as a good reference for the SSSOM/Mondo alternative.

There are cases where xref is classified as both exactMatch and closeMatch

@bfoltyn I think we could make exactMatch higher priority than closeMatch as an easy way to label an xref as either exact or close.

bfoltyn · 2023-11-08T11:15:51Z

@bfoltyn I think we could make exactMatch higher priority than closeMatch as an easy way to label an xref as either exact or close.

@dhimmel What do you mean by higher priority? I thought we would include all mapping properties as list in the node data, as in option 1 in comment #18 (comment). Are you suggesting we include only one mapping property value exactMatch or closeMatch? Should we also include mondo: or skos:?

dhimmel · 2023-11-08T13:55:01Z

What do you mean by higher priority?

I think it might be best if we simplify/aggregate the xref metadata that goes into the nxontology node attribute data to something like (written here in YAML for ease):

xrefs:
  - xref_id: meddra:10008958
    xref_uri: http://identifiers.org/meddra/10008958
    relation: skos:exactMatch  # converting mondo:exactMatch to skos:exactMatch if applicable
    sources: [MONDO:equivalentTo, DOID:2224]  # haven't cleaned this up yet

With this design, an xref_id would only appear once per node and all other metadata would be aggregated.

bfoltyn · 2023-11-08T14:15:22Z

xref metadata that goes into the nxontology node attribute data to something like (written here in YAML for ease)

@dhimmel Currently xrefs field in the node data is a list of strings. Do we want to replace it with the example you suggested? The reason I suggested introducing a new field with these properties was to not introduce a breaking change.

bfoltyn · 2023-11-08T14:23:20Z

    relation: skos:exactMatch  # converting mondo:exactMatch to skos:exactMatch if applicable

@dhimmel Should we use the following logic?

If there is skos:exactMatch in mapping properties we set the value to => skos:exactMatch
If there is mondo:exactMatch in mapping properties we set the value to => skos:exactMatch
If there is skos:closeMatch in mapping properties we set the value to => skos:closeMatch
If there is monde:closeMatch in mapping properties we set the value to => skos:closeMatch
otherwise we set the value to null

dhimmel · 2023-11-08T14:56:05Z

Currently xrefs field in the node data is a list of strings. Do we want to replace it with the example you suggested

We could either replace it or create a new field like xref_details. Slightly leaning towards a new field.

Should we use the following logic?

That logic sounds good. If there are other interesting values in the otherwise set, we can support those later.

bfoltyn · 2023-11-08T17:51:00Z

We could either replace it or create a new field like xref_details. Slightly leaning towards a new field.

I think we can add new field. xref_details sounds good. Should this field also include xrefs from xrefs query or just from mapping_properties and xref_sources?

dhimmel · 2023-11-08T17:53:39Z

Should this field also include xrefs from xrefs query or just from mapping_properties and xref_sources

Ideally all of them, such that a user only needs xref_details.

merges #21 refs #18 Co-authored-by: Bartek Foltyn <[email protected]>

bfoltyn · 2023-11-14T14:35:09Z

@dhimmel I've noticed that sometimes xref_sources in xref_details contains null. For example in MONDO:0020507

"xref_details": [
  {
    "xref_id": "DOID:0070374",
    "relation": "skos:exactMatch",
    "sources": [
      null
    ]
  },

Should we make axiom_source required in the axiom_sources query?

nxontology-data/nxontology_data/efo/queries/xref_sources.rq

Line 12 in fb93d9e

OPTIONAL { ?axiom oboInOwl:source ?axiom_source }.

Another way would be to filter out null values after the aggregation in EfoProcessor.get_xref_details method.

nxontology-data/nxontology_data/efo/efo.py

Lines 256 to 279 in fb93d9e

    
           def get_xref_details(self) -> dict[str, dict[str, str | list[str] | None]]: 
        
               xrefs = self.get_xrefs_df()[["efo_id", "xref_bioregistry"]].rename( 
        
                   columns={"xref_bioregistry": "xref_id"} 
        
               ) 
        
               xref_sources = ( 
        
                   self.get_xref_sources_df() 
        
                   .assign( 
        
                       xref_id=lambda df: df["xref"] 
        
                       .str.split(":", expand=True) 
        
                       .apply( 
        
                           lambda row: normalize_parsed_curie( 
        
                               xref_prefix=row[0], 
        
                               xref_accession=row[1], 
        
                               collapse_orphanet=True, 
        
                           ), 
        
                           axis="columns", 
        
                       ) 
        
                   ) 
        
                   .groupby(["efo_id", "xref_id"])["axiom_source"] 
        
                   .apply(list) 
        
                   .reset_index() 
        
                   .rename(columns={"axiom_source": "sources"}) 
        
               )

dhimmel · 2023-11-14T17:34:56Z

Should we make axiom_source required in the axiom_sources query?

This is the solution I prefer unless you advocate for a different one. Potentially leave a comment in that query that OPTIONAL will include extra results where axiom_source is missing.

bfoltyn mentioned this issue Sep 27, 2023

Additional EFO xref context from axioms #19

Merged

bfoltyn mentioned this issue Nov 9, 2023

EFO: add xref details to node data #21

Merged

dhimmel pushed a commit that referenced this issue Nov 9, 2023

EFO: add xref details to node data

c8e0502

merges #21 refs #18 Co-authored-by: Bartek Foltyn <[email protected]>

bfoltyn mentioned this issue Nov 14, 2023

EFO: make axiom_source required in xref_sources query #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EFO cross-references: classify as exact/close when possible #18

EFO cross-references: classify as exact/close when possible #18

dhimmel commented Sep 19, 2023

dhimmel commented Sep 19, 2023

bfoltyn commented Sep 21, 2023

dhimmel commented Sep 25, 2023

matentzn commented Sep 26, 2023

bfoltyn commented Sep 26, 2023

dhimmel commented Sep 26, 2023

bfoltyn commented Oct 13, 2023

dhimmel commented Oct 13, 2023

matentzn commented Oct 13, 2023

bfoltyn commented Nov 7, 2023

dhimmel commented Nov 7, 2023

bfoltyn commented Nov 8, 2023

dhimmel commented Nov 8, 2023

bfoltyn commented Nov 8, 2023

bfoltyn commented Nov 8, 2023

dhimmel commented Nov 8, 2023

bfoltyn commented Nov 8, 2023

dhimmel commented Nov 8, 2023

bfoltyn commented Nov 14, 2023

dhimmel commented Nov 14, 2023

EFO cross-references: classify as exact/close when possible #18

EFO cross-references: classify as exact/close when possible #18

Comments

dhimmel commented Sep 19, 2023

dhimmel commented Sep 19, 2023

bfoltyn commented Sep 21, 2023

dhimmel commented Sep 25, 2023

matentzn commented Sep 26, 2023

bfoltyn commented Sep 26, 2023

dhimmel commented Sep 26, 2023

bfoltyn commented Oct 13, 2023

dhimmel commented Oct 13, 2023

matentzn commented Oct 13, 2023

bfoltyn commented Nov 7, 2023

dhimmel commented Nov 7, 2023

bfoltyn commented Nov 8, 2023

dhimmel commented Nov 8, 2023

bfoltyn commented Nov 8, 2023

bfoltyn commented Nov 8, 2023

dhimmel commented Nov 8, 2023

bfoltyn commented Nov 8, 2023

dhimmel commented Nov 8, 2023

bfoltyn commented Nov 14, 2023

dhimmel commented Nov 14, 2023