Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFO cross-references: classify as exact/close when possible #18

Open
dhimmel opened this issue Sep 19, 2023 · 20 comments
Open

EFO cross-references: classify as exact/close when possible #18

dhimmel opened this issue Sep 19, 2023 · 20 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Sep 19, 2023

background in EBISPOT/efo#935

We currently extract database cross-references for EFO using the oboInOwl:hasDbXref predicate. However, MONDO is providing xrefs with greater specificity using the mondo:exactMatch and mondo:closeMatch predicates. Furthermore, there are axioms (with rdf:type owl:Axiom) that annotate oboInOwl:hasDbXref instances with values like MONDO:equivalentTo.

EFO:0000479 is a good example of a class that has all types of xrefs:

  1. oboInOwl:hasDbXref without axioms
  2. oboInOwl:hasDbXref with axioms
  3. mondo:exactMatch and mondo:closeMatch

It would be nice to further understand the relation between 2 and 3.

@dhimmel
Copy link
Member Author

dhimmel commented Sep 19, 2023

Here's a visualization by @ravwojdyla on why knowing close/exact (or equivalent/related, green/red in visualization) could help refine mappings to be bijective in certain situations like:

image

Also noting how an axiom appears in the EFO OWL source:

<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000640"/>
    <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
    <owl:annotatedTarget>Orphanet:319298</owl:annotatedTarget>
    <oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
</owl:Axiom>

@bfoltyn
Copy link
Contributor

bfoltyn commented Sep 21, 2023

@dhimmel I managed to recreate the database cross reference section that appears on the website by using axioms from the .owl file for EFO:0000479 and EFO:0000640. However, I noticed that for EFO:0000640, there are two extra xrefs MeSH:C538614 and UMLS:C2931899 , that are not displayed on the website, but are present in the xrefs query.

Do you know any examples for which it's more difficult to retrieve axioms?

I also noticed that sometimes the axiom has multiple oboInOwl:source values and sometimes a single cross referance has multiple axioms. For example for ICD9:238.71 in EFO:0000479

<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000479"/>
    <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
    <owl:annotatedTarget>ICD9:238.71</owl:annotatedTarget>
    <oboInOwl:source>DOID:2224</oboInOwl:source>
    <oboInOwl:source>EFO:0000479</oboInOwl:source>
    <oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
    <oboInOwl:source>MONDO:i2s</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
    <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000479"/>
    <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
    <owl:annotatedTarget>ICD9:238.71</owl:annotatedTarget>
    <oboInOwl:source>DOID:2224</oboInOwl:source>
    <oboInOwl:source>EFO:0000479</oboInOwl:source>
    <oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
    <oboInOwl:source>i2s</oboInOwl:source>
</owl:Axiom>

It looks like on the website the last source value used is to describe the cross reference. The ordering of these sources seems to be alphabetical, though. I'm not sure what approach we should use if there is more than one source. Do you have any suggestions?

Here is a query I used to retrieve the axioms from the owl file:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref (MAX(?source) AS ?axiom)
WHERE {
  ?axiom_element rdf:type owl:Axiom ;
         owl:annotatedSource ?annotatedSource ;
         owl:annotatedProperty ?annotatedProperty ;
         owl:annotatedTarget ?xref ;
         oboInOwl:source ?source .

  FILTER(?annotatedProperty = oboInOwl:hasDbXref)

  BIND( REPLACE( STR(?annotatedSource), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}
GROUP BY ?axiom_element ?efo_id ?annotatedProperty ?xref

And here is also a code snippet that I used in a jupyter notebook to retrieve and compare axioms:

Snippet
# type: ignore
%load_ext autoreload
%autoreload 2


import jupyter_black

jupyter_black.load()
import pandas as pd
from nxontology_data.efo.efo import EfoProcessor

pd.set_option("display.max_colwidth", None)
efo_processor = EfoProcessor(version="v3.57.0", name="efo_otar_profile")
# efo_processor.download_owl()

rdf = efo_processor.load_rdf()
xrefs = efo_processor.run_query("xrefs", cache=False)

xrefs
axioms = efo_processor.run_query("axioms", cache=False)

axioms
cross_reference_efo_0000479 = {
    "MESH:D013920 (Orphanet:3318/e)",
    "Orphanet:3318 (MONDO:equivalentTo)",
    "EFO:0000479 (MONDO:equivalentTo)",
    "Orphanet:71493 (MONDO:relatedTo)",
    "ICDO:9962/3 (NCIT:C3407)",
    "SCTID:109994006 (MONDO:equivalentTo)",
    "UMLS:C0040028 (Orphanet:3318/e)",
    "OMIM:614521",
    "NCIT:C3407 (exact-label-match)",
    "UMLS:C0040028 (Orphanet:3318)",
    "OMIM:601977",
    "MESH:D013920 (Orphanet:3318)",
    "ONCOTREE:ET (MONDO:equivalentTo)",
    "NCIT:C3407 (MONDO:exact-label-match)",
    "MONDO:0005029",
    "ICD10:D47.3 (Orphanet:3318)",
    "GARD:0006594 (MONDO:equivalentTo)",
    "ICD9:238.71 (i2s)",
    "OMIM:187950",
    "ICD9:238.71 (MONDO:i2s)",
    "COHD:438383 (MONDO:equivalentTo)",
    "DOID:2224 (MONDO:equivalentTo)",
    "MedDRA:10015493 (Orphanet:3318)",
    "MedDRA:10015493 (Orphanet:3318/e)",
}

cross_reference_efo_000640 = {
    "UMLS:CN205129 (MONDO:equivalentTo)",
    "GARD:0009575 (shared-umls-xref)",
    "GARD:0009572 (MONDO:equivalentObsolete)",
    "ICD10:C64 (Orphanet:47044)",
    "Orphanet:47044 (OMIM:605074)",
    "Orphanet:319298 (MONDO:equivalentTo)",
    "NCIT:C6975 (MONDO:equivalentTo)",
    "GARD:0009572 (MONDO:equivalentTo)",
    "OMIM:605074 (Orphanet:47044)",
    "GARD:0009575 (MONDO:shared-umls-xref)",
    "UMLS:C1306837 (Orphanet:319298)",
    "DOID:4465 (MONDO:equivalentTo)",
    "ONCOTREE:PRCC (MONDO:equivalentTo)",
    "EFO:0000640 (MONDO:equivalentTo)",
    "SCTID:733608000 (MONDO:equivalentTo)",
    "MONDO:0017884",
    "UMLS:C1336078 (MONDO:equivalentTo)",
    "UMLS:C1306837 (Orphanet:319298/e)",
}


axioms[axioms["efo_id"] == "EFO:0000479"]
xrefs_with_axiom_efo_479 = (
    xrefs[xrefs["efo_id"] == "EFO:0000479"]
    .merge(axioms, on=["efo_id", "xref"], how="left")
    .sort_values("xref")
    .assign(
        desc=lambda df: df.apply(
            lambda row: row["xref"]
            if pd.isnull(row["axiom"])
            else f"{row['xref']} ({row['axiom']})",
            axis=1,
        )
    )
)

display(xrefs_with_axiom_efo_479)

display(set(xrefs_with_axiom_efo_479["desc"]) - cross_reference_efo_0000479)
display(cross_reference_efo_0000479 - set(xrefs_with_axiom_efo_479["desc"]))
display(set(xrefs_with_axiom_efo_479["desc"]) == cross_reference_efo_0000479)
xrefs_with_axiom_efo_640 = (
    xrefs[xrefs["efo_id"] == "EFO:0000640"]
    .merge(axioms, on=["efo_id", "xref"], how="left")
    .sort_values("xref")
    .assign(
        desc=lambda df: df.apply(
            lambda row: row["xref"]
            if pd.isnull(row["axiom"])
            else f"{row['xref']} ({row['axiom']})",
            axis=1,
        )
    )[["efo_id", "xref", "xref_prefix", "xref_accession", "axiom", "desc"]]
)

display(set(xrefs_with_axiom_efo_640["desc"]) == cross_reference_efo_000640)
display(set(xrefs_with_axiom_efo_640["desc"]) - cross_reference_efo_000640)
display(cross_reference_efo_000640 - set(xrefs_with_axiom_efo_640["desc"]))

display(xrefs_with_axiom_efo_640)

@dhimmel
Copy link
Member Author

dhimmel commented Sep 25, 2023

Nice work @bfoltyn.

I think we'll want to preserve all sources provided by axioms rather than taking the max. So the output would be keyed on ?efo_id ?xref ?axiom_source. Could also consider making ?axiom_source optional such that we still match xrefs without axioms.

sometimes a single cross reference has multiple axioms

Hmm, so it appears that an axiom can provide multiple sources for a cross-reference. I am not sure why in the (EFO:0000479 subject, ICD9:238.71 object, oboInOwl:hasDbXref predicate) triplet has multiple axioms, which duplicate all but one sources across them. Perhaps @zoependlington or @matentzn would know?

@matentzn
Copy link

I think we'll want to preserve all sources provided by axioms rather than taking the max.

Absolutely the order has no meaning at all.

sometimes a single cross-reference has multiple axioms

This is due to the fact that the cross references have not been normalised.

:a :hasDbXref :b {source: "X"}
:a :hasDbXref :b {source: "Y"}

Is allowed in the OWL data model (which is good in some cases, think of provenance!).

We have a special method in mondo called "normalisation" that turns this into:

:a :hasDbXref :b {source: "X", source: "Y"}

But this is not at all consistently applied to all ontologies.

TLDR: There is no requirement for normalising axiom annotation, so you have to be able to deal with the unnormlised case!

@bfoltyn
Copy link
Contributor

bfoltyn commented Sep 26, 2023

TLDR: There is no requirement for normalising axiom annotation, so you have to be able to deal with the unnormlised case!

Thanks @matentzn !


I think we'll want to preserve all sources provided by axioms rather than taking the max. So the output would be keyed on ?efo_id ?xref ?axiom_source. Could also consider making ?axiom_source optional such that we still match xrefs without axioms.

@dhimmel If we want to preserve all sources, I think that we can use the following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref  ?axiom_source
WHERE {
  ?axiom_element rdf:type owl:Axiom ;
  owl:annotatedSource ?source ;
  owl:annotatedProperty oboInOwl:hasDbXref ;
  owl:annotatedTarget ?xref ;

  OPTIONAL { ?axiom_element oboInOwl:source ?axiom_source }

  BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}

GROUP BY ?efo_id ?xref ?axiom_source

For efo_id=EFO:0000479 and xref=ICD9:238.71, the results are:

efo_id xref axiom_source
EFO:0000479 ICD9:238.71 DOID:2224
EFO:0000479 ICD9:238.71 EFO:0000479
EFO:0000479 ICD9:238.71 MONDO:equivalentTo
EFO:0000479 ICD9:238.71 MONDO:i2s
EFO:0000479 ICD9:238.71 i2s

About the optional source, there are 46 rows with null axiom_source compared to 117675 where it has a value. Sometimes axioms can have attributes other than oboInOwl:source like oboInOwl:hasDbXref or skos:closeMatch

Examples
  • this Axiom has oboInOwl:hasDbXref with MONDO:equivalentTo value
<owl:Axiom>
        <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0007216"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget>OMIMPS:142340</owl:annotatedTarget>
        <oboInOwl:hasDbXref>MONDO:equivalentTo</oboInOwl:hasDbXref>
</owl:Axiom>
  • this Axiom has skos:closeMatch without value:
<owl:Axiom>
        <owl:annotatedSource rdf:resource="http://www.ebi.ac.uk/efo/EFO_0008494"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget>MESH:D065632</owl:annotatedTarget>
        <skos:closeMatch></skos:closeMatch>
</owl:Axiom>

Should we keep the details which predicates are used within axioms? Or is just using the oboInOwl:source ok?


@dhimmel For mondo:exactMatch and mondo:closeMatch, slightly modified version of the query, you used in in EBISPOT/efo#935 should work:

PREFIX mondo: <http://purl.obolibrary.org/obo/mondo#>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
SELECT ?efo_id ?efo_uri ?predicate_id ?match ?predicate_uri
WHERE {
  VALUES ?predicate_uri {mondo:closeMatch mondo:exactMatch}
  ?efo_uri ?predicate_uri ?match


  BIND( REPLACE( STR(?efo_uri), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
  BIND( REPLACE( STR(?predicate_uri), "^http://purl.obolibrary.org/obo/mondo#(.+)$", "$1" ) AS ?predicate_id )
}

I validated the results against the OLS API and they're correct. Here's a snippet that compares mondo:exactMatch and mondo:closeMatch:

Snippet
# type: ignore
%load_ext autoreload
%autoreload 2


import jupyter_black

jupyter_black.load()
import pandas as pd
from nxontology_data.efo.efo import EfoProcessor
efo_processor = EfoProcessor(version="v3.58.0", name="efo_otar_profile")
# efo_processor.download_owl()

rdf = efo_processor.load_rdf()
matches = efo_processor.run_query("matches", cache=False)

matches
import functools
import requests
import urllib.parse


@functools.lru_cache(maxsize=None)
def api_request(efo_uri: str):
    encoded = urllib.parse.quote_plus(urllib.parse.quote_plus(efo_uri))
    return requests.get(
        url=f"https://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/{encoded}"
    ).json()


api_request.cache_clear()
def get_api_matches(efo_id: str):
    res = api_request(efo_id)
    return {
        "close_match": set(res["annotation"].get("closeMatch", [])),
        "exact_match": set(res["annotation"].get("exactMatch", [])),
    }
pivot_matches = (
    matches.groupby(["efo_id", "efo_uri", "predicate_id"])["match"]
    .apply(list)
    .reset_index()
    .pivot(index=["efo_id", "efo_uri"], columns="predicate_id", values="match")
    .reset_index()
    .rename(columns={"exactMatch": "exact_match", "closeMatch": "close_match"})
)

pivot_matches
pd.isnull(pivot_matches["close_match"]).value_counts()
pd.isnull(pivot_matches["exact_match"]).value_counts()
sample_matches = pivot_matches.sample(200).fillna("")

sample_matches
def safe_call(x):
    try:
        return get_api_matches(x)
    except Exception as e:
        print(f"Error for {x}: {e}")
        return {"closeMatch": set(), "exactMatch": set()}
compare_df = (
    sample_matches.fillna("")
    .assign(
        api_close_match=lambda df: df["efo_uri"].apply(
            lambda x: safe_call(x)["close_match"]
        ),
        api_exact_match=lambda df: df["efo_uri"].apply(
            lambda x: safe_call(x)["exact_match"]
        ),
        exact_match=lambda df: df["exact_match"].apply(set),
        close_match=lambda df: df["close_match"].apply(set),
    )
    .assign(
        exact_match_equal=lambda df: df.apply(
            lambda row: row["exact_match"] == row["api_exact_match"], axis=1
        ),
        close_match_equal=lambda df: df.apply(
            lambda row: row["close_match"] == row["api_close_match"], axis=1
        ),
        extra_exact_match_in_api=lambda df: df.apply(
            lambda row: row["api_exact_match"] - row["exact_match"], axis=1
        ),
        extra_exact_match_in_efo=lambda df: df.apply(
            lambda row: row["exact_match"] - row["api_exact_match"], axis=1
        ),
        extra_close_match_in_api=lambda df: df.apply(
            lambda row: row["api_close_match"] - row["close_match"], axis=1
        ),
        extra_close_match_in_efo=lambda df: df.apply(
            lambda row: row["close_match"] - row["api_close_match"], axis=1
        ),
    )
)

compare_df
(compare_df.groupby(["exact_match_equal", "close_match_equal"]).size())

The current results return URLs like http://purl.obolibrary.org/obo/Orphanet_98576. Should we keep this format or transform them?


@dhimmel Lastly, what format should the axioms, mondo:exactMatch and mondo:closeMatch have in the output json file?

@dhimmel
Copy link
Member Author

dhimmel commented Sep 26, 2023

The current results return URLs like http://purl.obolibrary.org/obo/Orphanet_98576. Should we keep this format or transform them?

That URL is the class URI and we often assign it to a variable with a _uri suffix. The corresponding CURIE (compact URI) version is Orphanet:98576 and we often use an _id suffix for this. The SPARQL query can include both the URI and CURIES as separate output fields.

What we are after is for each oboInOwl:hasDbXref:

  • what are all the sources providing that xref
  • what mapping property applies to the xref, e.g. exactMatch or closeMatch

A tabular output from a SPARQL query is the ideal first output here. Not sure if you can fit everything in one query/table or you need multiple. I leave that up to your investigation.

To complicate things further (hehe), we should consider whether the python oaklib, which can extract mappings to the SSSOM format is a better approach here than writing our own SPARQL queries. SSSOM stands for Simple Standard for Sharing Ontological Mappings (publication).

Possibly best to transition to PRs at this point to enable easier review of the SPARQL queries. PR can be draft and incomplete.

@bfoltyn
Copy link
Contributor

bfoltyn commented Oct 13, 2023

@dhimmel regarding including xref_sources and mapping_properties in node data, I have a couple of ideas:

Option 1: xref_properties field with a list with the following schema:

xref_id: str
sources: list[str]
mapping_properties: list[str]
Example
{
  "xref_properties": [
    {
      "xref_id": "orphanet:319298",
      "axiom_sources": ["MONDO:equivalentTo"],
      "mapping_properties": ["mondo:exactMatch"]
    }
  ]
}

Option 2: Separate xref_sources and mapping_properties

xref_sources schema:

xref_id: str
axiom_source: str

mapping_properties schema:

xref_id: str
axiom_source: str
Example
{
  "axiom_sources": [
    { "xref_id": "orphanet:319298", "axiom_source": "MONDO:equivalentTo" }
  ],
  "mapping_properties": [
    { "xref_id": "orphanet:319298", "mapping_source": "mondo:exactMatch" }
  ]
}

Option 3: Second option inside xref_properties field:

Example
{
  "xref_properties": {
    "axiom_sources": [
      { "xref_id": "orphanet:319298", "axiom_source": "MONDO:equivalentTo" }
    ],
    "mapping_properties": [
      { "xref_id": "orphanet:319298", "mapping_source": "mondo:exactMatch" }
    ]
  }
}

Please let me know your thoughts on these options, or if there are any other ideas you have.

@dhimmel
Copy link
Member Author

dhimmel commented Oct 13, 2023

I like option 1. Will there be a slight imprecision where one source have one property and another source could have a conflicting property? For example, an xref being classified as both an exactMatch and closeMatch from different resources?

@matentzn
Copy link

Just FYI: what you are trying to do here is much much harder than you think right now - and not necessary.

EFO is not a good source for mappings, because it mixes old (ancient) with new (harmonised) xrefs, and makes strange distinctions like "mondo:exactMatch" (which is not even a thing in Mondo). What you should do instead is:

  1. ETL the primary SSSOM file for mondo mappings: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo.sssom.tsv
  2. use semra (see also https://github.com/biopragmatics/semra/blob/main/notebooks/umls-inference-analysis.ipynb), cc @cthoyt to chain the mappings together in a way to get the correct EFO to X mappings
  3. export the mappings to sssom and feed that into your system

Just my two cents as someone driving by :D

@bfoltyn
Copy link
Contributor

bfoltyn commented Nov 7, 2023

I like option 1. Will there be a slight imprecision where one source have one property and another source could have a conflicting property? For example, an xref being classified as both an exactMatch and closeMatch from different resources?

@dhimmel There are cases where xref is classified as both exactMatch and closeMatch. For example in EFO:0000095 xref meddra:10008958 has mondo:closeMatch and skos:exactMatch

(
    pd.read_json(
        "https://github.com/related-sciences/nxontology-data/raw/output/efo/efo_otar_profile_mapping_properties.json.gz"
    ).pipe(
        lambda df: df[
            (df["efo_id"] == "EFO:0000095") & (df["xref_id"] == "meddra:10008958")
        ]
    )
)
efo_id xref_id mapping_property_id efo_uri xref_uri mapping_property_uri
EFO:0000095 meddra:10008958 mondo:closeMatch http://www.ebi.ac.uk/efo/EFO_0000095 http://identifiers.org/meddra/10008958 http://purl.obolibrary.org/obo/mondo#closeMatch
EFO:0000095 meddra:10008958 skos:exactMatch http://www.ebi.ac.uk/efo/EFO_0000095 http://identifiers.org/meddra/10008958 http://www.w3.org/2004/02/skos/core#exactMatch

There are 102 cases like this:

All cases
(
    pd.read_json(
        "https://github.com/related-sciences/nxontology-data/raw/output/efo/efo_otar_profile_mapping_properties.json.gz"
    )
    .groupby(["efo_id", "xref_id"])["mapping_property_id"]
    .apply(list)
    .reset_index()
    .assign(
        mapping_property_id=lambda df: df["mapping_property_id"].apply(
            lambda x: ",".join(x)
        ),
        has_close_match=lambda df: df["mapping_property_id"].str.contains("closeMatch"),
        has_exact_match=lambda df: df["mapping_property_id"].str.contains("exactMatch"),
    )
    .pipe(lambda df: df[df["has_close_match"] & df["has_exact_match"]])
)
  efo_id xref_id mapping_property_id has_close_match has_exact_match
EFO:0000095 meddra:10008958 mondo:closeMatch,skos:exactMatch True True
EFO:0000174 meddra:10015560 mondo:closeMatch,skos:exactMatch True True
EFO:0000181 meddra:10060121 mondo:closeMatch,skos:exactMatch True True
EFO:0000182 meddra:10049010 mondo:closeMatch,skos:exactMatch True True
EFO:0000191 meddra:10060707 mondo:closeMatch,skos:exactMatch True True
EFO:0000198 meddra:10028532 mondo:closeMatch,skos:exactMatch True True
EFO:0000221 meddra:10000871 mondo:closeMatch,skos:exactMatch True True
EFO:0000221 meddra:10059439 mondo:closeMatch,skos:exactMatch True True
EFO:0000222 meddra:10000880 mondo:closeMatch,skos:exactMatch True True
EFO:0000223 meddra:10000890 mondo:closeMatch,skos:exactMatch True True
EFO:0000224 meddra:10001019 mondo:closeMatch,skos:exactMatch True True
EFO:0000248 meddra:10065867 mondo:closeMatch,skos:exactMatch True True
EFO:0000255 meddra:10002449 mondo:closeMatch,skos:exactMatch True True
EFO:0000309 meddra:10006595 mondo:closeMatch,skos:exactMatch True True
EFO:0000309 meddra:10053518 mondo:closeMatch,skos:exactMatch True True
EFO:0000309 meddra:10067184 mondo:closeMatch,skos:exactMatch True True
EFO:0000333 meddra:10008734 mondo:closeMatch,skos:exactMatch True True
EFO:0000339 meddra:10009013 mondo:closeMatch,skos:exactMatch True True
EFO:0000403 meddra:10012818 mondo:closeMatch,skos:exactMatch True True
EFO:0000437 meddra:10065868 mondo:closeMatch,skos:exactMatch True True
EFO:0000479 meddra:10015493 mondo:closeMatch,skos:exactMatch True True
EFO:0000500 meddra:10017709 mondo:closeMatch,skos:exactMatch True True
EFO:0000502 meddra:10017708 mondo:closeMatch,skos:exactMatch True True
EFO:0000502 mesh:D018305 mondo:exactMatch,skos:closeMatch True True
EFO:0000519 meddra:10018336 mondo:closeMatch,skos:exactMatch True True
EFO:0000519 meddra:10018337 mondo:closeMatch,skos:exactMatch True True
EFO:0000558 meddra:10023284 mondo:closeMatch,skos:exactMatch True True
EFO:0000564 meddra:10024189 mondo:closeMatch,skos:exactMatch True True
EFO:0000569 meddra:10024627 mondo:closeMatch,skos:exactMatch True True
EFO:0000574 meddra:10025310 mondo:closeMatch,skos:exactMatch True True
EFO:0000630 meddra:10027744 mondo:closeMatch,skos:exactMatch True True
EFO:0000632 meddra:10030286 mondo:closeMatch,skos:exactMatch True True
EFO:0000681 meddra:10067946 mondo:closeMatch,skos:exactMatch True True
EFO:0000762 meddra:10019827 mondo:closeMatch,skos:exactMatch True True
EFO:0001376 meddra:10042863 mondo:closeMatch,skos:exactMatch True True
EFO:0001378 meddra:10028228 mondo:closeMatch,skos:exactMatch True True
EFO:0002087 meddra:10016632 mondo:closeMatch,skos:exactMatch True True
EFO:0002429 meddra:10036057 mondo:closeMatch,skos:exactMatch True True
EFO:0002499 meddra:10002224 mondo:closeMatch,skos:exactMatch True True
EFO:0002499 meddra:10060971 mondo:closeMatch,skos:exactMatch True True
EFO:0002501 meddra:10026659 mondo:closeMatch,skos:exactMatch True True
EFO:0002892 meddra:10007476 mondo:closeMatch,skos:exactMatch True True
EFO:0002914 meddra:10039497 mondo:closeMatch,skos:exactMatch True True
EFO:0002916 meddra:10030155 mondo:closeMatch,skos:exactMatch True True
EFO:0002918 meddra:10039022 mondo:closeMatch,skos:exactMatch True True
EFO:0002939 meddra:10027107 mondo:closeMatch,skos:exactMatch True True
EFO:0003094 meddra:10017701 mondo:closeMatch,skos:exactMatch True True
EFO:0003802 meddra:10038269 mondo:closeMatch,skos:exactMatch True True
EFO:0003811 meddra:10038270 mondo:closeMatch,skos:exactMatch True True
EFO:0003968 meddra:10002476 mondo:closeMatch,skos:exactMatch True True
EFO:0005221 meddra:10004593 mondo:closeMatch,skos:exactMatch True True
EFO:0005221 meddra:10008593 mondo:closeMatch,skos:exactMatch True True
EFO:0005287 meddra:10038804 mondo:closeMatch,skos:exactMatch True True
EFO:0005567 meddra:10056558 mondo:closeMatch,skos:exactMatch True True
EFO:0005952 meddra:10029547 mondo:closeMatch,skos:exactMatch True True
EFO:0006460 meddra:10051938 mondo:closeMatch,skos:exactMatch True True
EFO:0006738 meddra:10035484 mondo:closeMatch,skos:exactMatch True True
EFO:0007143 meddra:10001882 mondo:closeMatch,skos:exactMatch True True
EFO:0007252 meddra:10048251 mondo:closeMatch,skos:exactMatch True True
EFO:0007359 meddra:10056450 mondo:closeMatch,skos:exactMatch True True
EFO:0007549 meddra:10017852 mondo:closeMatch,skos:exactMatch True True
EFO:0009001 meddra:10026891 mondo:closeMatch,skos:exactMatch True True
EFO:0009441 meddra:10047801 mondo:closeMatch,skos:exactMatch True True
EFO:1000028 meddra:10014967 mondo:closeMatch,skos:exactMatch True True
EFO:1000079 meddra:10048853 mondo:closeMatch,skos:exactMatch True True
EFO:1000157 meddra:10036685 mondo:closeMatch,skos:exactMatch True True
EFO:1000209 meddra:10011318 mondo:closeMatch,skos:exactMatch True True
EFO:1000249 meddra:10033366 mondo:closeMatch,skos:exactMatch True True
EFO:1000249 meddra:10068223 mondo:closeMatch,skos:exactMatch True True
EFO:1000292 meddra:10062001 mondo:closeMatch,skos:exactMatch True True
EFO:1000309 meddra:10023249 mondo:closeMatch,skos:exactMatch True True
EFO:1000314 meddra:10064886 mondo:closeMatch,skos:exactMatch True True
EFO:1000318 meddra:10069698 mondo:closeMatch,skos:exactMatch True True
EFO:1000334 meddra:10049459 mondo:closeMatch,skos:exactMatch True True
EFO:1000355 meddra:10027406 mondo:closeMatch,skos:exactMatch True True
EFO:1000396 meddra:10051713 mondo:closeMatch,skos:exactMatch True True
EFO:1000475 meddra:10050487 mondo:closeMatch,skos:exactMatch True True
EFO:1000476 meddra:10035059 mondo:closeMatch,skos:exactMatch True True
EFO:1000478 meddra:10035079 mondo:closeMatch,skos:exactMatch True True
EFO:1000491 meddra:10065857 mondo:closeMatch,skos:exactMatch True True
EFO:1000496 meddra:10036832 mondo:closeMatch,skos:exactMatch True True
EFO:1000550 meddra:10062113 mondo:closeMatch,skos:exactMatch True True
EFO:1000558 meddra:10042926 mondo:closeMatch,skos:exactMatch True True
EFO:1000560 meddra:10042985 mondo:closeMatch,skos:exactMatch True True
EFO:1000576 meddra:10061031 mondo:closeMatch,skos:exactMatch True True
EFO:1000581 meddra:10043670 mondo:closeMatch,skos:exactMatch True True
EFO:1000595 meddra:10002240 mondo:closeMatch,skos:exactMatch True True
EFO:1000616 meddra:10061252 mondo:closeMatch,skos:exactMatch True True
EFO:1000785 meddra:10040493 mondo:closeMatch,skos:exactMatch True True
EFO:1000796 meddra:10001388 mondo:closeMatch,skos:exactMatch True True
EFO:1000828 meddra:10067399 mondo:closeMatch,skos:exactMatch True True
EFO:1000895 DOID:6785 mondo:exactMatch,skos:closeMatch True True
EFO:1000895 meddra:10064581 mondo:closeMatch,skos:exactMatch True True
EFO:1000919 meddra:10057649 mondo:closeMatch,skos:exactMatch True True
EFO:1000926 meddra:10060801 mondo:closeMatch,skos:exactMatch True True
EFO:1000956 meddra:10019053 mondo:closeMatch,skos:exactMatch True True
EFO:1001187 meddra:10041329 mondo:closeMatch,skos:exactMatch True True
EFO:1001229 meddra:10046752 mondo:closeMatch,skos:exactMatch True True
EFO:1001465 meddra:10018340 mondo:closeMatch,skos:exactMatch True True
EFO:1001469 meddra:10061275 mondo:closeMatch,skos:exactMatch True True
EFO:1001779 meddra:10009018 mondo:closeMatch,skos:exactMatch True True
EFO:1001972 meddra:10025552 mondo:closeMatch,skos:exactMatch True True

@dhimmel
Copy link
Member Author

dhimmel commented Nov 7, 2023

what you are trying to do here is much much harder than you think right now - and not necessary

Thanks @matentzn for these insights. I'm looking forward to exploring the SSSOM Mondo mappings combined with semra to convert them to EFO-keyed mappings. For now I think it makes sense to continue our current approach, since we're close to having it complete and being evaluable, at least as a good reference for the SSSOM/Mondo alternative.

There are cases where xref is classified as both exactMatch and closeMatch

@bfoltyn I think we could make exactMatch higher priority than closeMatch as an easy way to label an xref as either exact or close.

@bfoltyn
Copy link
Contributor

bfoltyn commented Nov 8, 2023

@bfoltyn I think we could make exactMatch higher priority than closeMatch as an easy way to label an xref as either exact or close.

@dhimmel What do you mean by higher priority? I thought we would include all mapping properties as list in the node data, as in option 1 in comment #18 (comment). Are you suggesting we include only one mapping property value exactMatch or closeMatch? Should we also include mondo: or skos:?

@dhimmel
Copy link
Member Author

dhimmel commented Nov 8, 2023

What do you mean by higher priority?

I think it might be best if we simplify/aggregate the xref metadata that goes into the nxontology node attribute data to something like (written here in YAML for ease):

xrefs:
  - xref_id: meddra:10008958
    xref_uri: http://identifiers.org/meddra/10008958
    relation: skos:exactMatch  # converting mondo:exactMatch to skos:exactMatch if applicable
    sources: [MONDO:equivalentTo, DOID:2224]  # haven't cleaned this up yet

With this design, an xref_id would only appear once per node and all other metadata would be aggregated.

@bfoltyn
Copy link
Contributor

bfoltyn commented Nov 8, 2023

xref metadata that goes into the nxontology node attribute data to something like (written here in YAML for ease)

@dhimmel Currently xrefs field in the node data is a list of strings. Do we want to replace it with the example you suggested? The reason I suggested introducing a new field with these properties was to not introduce a breaking change.

@bfoltyn
Copy link
Contributor

bfoltyn commented Nov 8, 2023

    relation: skos:exactMatch  # converting mondo:exactMatch to skos:exactMatch if applicable

@dhimmel Should we use the following logic?

  • If there is skos:exactMatch in mapping properties we set the value to => skos:exactMatch
  • If there is mondo:exactMatch in mapping properties we set the value to => skos:exactMatch
  • If there is skos:closeMatch in mapping properties we set the value to => skos:closeMatch
  • If there is monde:closeMatch in mapping properties we set the value to => skos:closeMatch
  • otherwise we set the value to null

@dhimmel
Copy link
Member Author

dhimmel commented Nov 8, 2023

Currently xrefs field in the node data is a list of strings. Do we want to replace it with the example you suggested

We could either replace it or create a new field like xref_details. Slightly leaning towards a new field.

Should we use the following logic?

That logic sounds good. If there are other interesting values in the otherwise set, we can support those later.

@bfoltyn
Copy link
Contributor

bfoltyn commented Nov 8, 2023

We could either replace it or create a new field like xref_details. Slightly leaning towards a new field.

I think we can add new field. xref_details sounds good. Should this field also include xrefs from xrefs query or just from mapping_properties and xref_sources?

@dhimmel
Copy link
Member Author

dhimmel commented Nov 8, 2023

Should this field also include xrefs from xrefs query or just from mapping_properties and xref_sources

Ideally all of them, such that a user only needs xref_details.

dhimmel pushed a commit that referenced this issue Nov 9, 2023
merges #21
refs #18

Co-authored-by: Bartek Foltyn <[email protected]>
@bfoltyn
Copy link
Contributor

bfoltyn commented Nov 14, 2023

@dhimmel I've noticed that sometimes xref_sources in xref_details contains null. For example in MONDO:0020507

"xref_details": [
  {
    "xref_id": "DOID:0070374",
    "relation": "skos:exactMatch",
    "sources": [
      null
    ]
  },

Should we make axiom_source required in the axiom_sources query?

OPTIONAL { ?axiom oboInOwl:source ?axiom_source }.

Another way would be to filter out null values after the aggregation in EfoProcessor.get_xref_details method.

def get_xref_details(self) -> dict[str, dict[str, str | list[str] | None]]:
xrefs = self.get_xrefs_df()[["efo_id", "xref_bioregistry"]].rename(
columns={"xref_bioregistry": "xref_id"}
)
xref_sources = (
self.get_xref_sources_df()
.assign(
xref_id=lambda df: df["xref"]
.str.split(":", expand=True)
.apply(
lambda row: normalize_parsed_curie(
xref_prefix=row[0],
xref_accession=row[1],
collapse_orphanet=True,
),
axis="columns",
)
)
.groupby(["efo_id", "xref_id"])["axiom_source"]
.apply(list)
.reset_index()
.rename(columns={"axiom_source": "sources"})
)

@dhimmel
Copy link
Member Author

dhimmel commented Nov 14, 2023

Should we make axiom_source required in the axiom_sources query?

This is the solution I prefer unless you advocate for a different one. Potentially leave a comment in that query that OPTIONAL will include extra results where axiom_source is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants