Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional EFO xref context from axioms #19

Merged
merged 4 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions nxontology_data/efo/efo.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from typing import Any

import bioversions
import curies
import fsspec
import networkx as nx
import pandas as pd
Expand Down Expand Up @@ -142,6 +143,42 @@ def get_obsolete_df(self) -> pd.DataFrame:
def get_alt_id_df(self) -> pd.DataFrame:
return self.run_query("alt_id", cache=True)

def get_xref_sources_df(self) -> pd.DataFrame:
return self.run_query("xref_sources", cache=True)

def get_mapping_properties_df(self) -> pd.DataFrame:
converter = curies.get_bioregistry_converter()

converter.add_prefix(
"icd10cm-missing-prefix", "http://purl.bioontology.org/ontology/ICD10CM/"
)

df = (
self.run_query("mapping_properties", cache=True)
.assign(
xref_id=lambda df: df["xref_id"].apply(
lambda xref: converter.compress(xref)
)
)
.dropna()
.assign(
xref_id=lambda df: df["xref_id"]
.str.replace("icd10cm-missing-prefix:", "icd10cm:")
.str.replace("obo:Orphanet_", "Orphanet:")
.str.split(":", expand=True)
.apply(
lambda row: normalize_parsed_curie(
xref_prefix=row[0],
xref_accession=row[1],
collapse_orphanet=True,
),
axis="columns",
)
)
)

return df

def get_synonyms(self) -> dict[str, dict[str, str]]:
synonym_scopes = {
"hasExactSynonym": "exact",
Expand Down Expand Up @@ -272,6 +309,14 @@ def write_outputs(self) -> None:
write_dataframe(
self.get_obsolete_df(), output_dir.joinpath(f"{self.name}_obsolete.json.gz")
)
write_dataframe(
self.get_mapping_properties_df(),
output_dir.joinpath(f"{self.name}_mapping_properties.json.gz"),
)
write_dataframe(
self.get_xref_sources_df(),
output_dir.joinpath(f"{self.name}_xref_sources.json.gz"),
)
if nxo.name == "efo_otar_profile":
nxo_slim = self.create_slim_nxo(nxo)
write_ontology(nxo_slim, output_dir, compression_threshold_mb=30.0)
Expand Down
17 changes: 17 additions & 0 deletions nxontology_data/efo/queries/mapping_properties.rq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the queries are done, what would be the next steps? How this data should be represented in the output?

We can save the tables as output and/or we can include in the nxontology. We'll have to decide how we want to represent this information as node data in networkx. Doing so is tricky because we have to decide to what extent users will want access to rawer forms versus a more consolidated but opinionated format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel How about saving the table in the output for now, so that users can access this data? We could create a follow up issue to discuss how we can represent this information as node data in networkx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about saving the table in the output for now

Sounds good.

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
PREFIX mondo: <http://purl.obolibrary.org/obo/mondo#>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?efo_id (?xref_uri as ?xref_id) ?mapping_property_id ?efo_uri ?xref_uri ?mapping_property_uri
WHERE {
VALUES ?mapping_property_uri {mondo:closeMatch mondo:exactMatch skos:mappingRelation skos:closeMatch skos:exactMatch skos:broadMatch skos:narrowMatch skos:relatedMatch}

?efo_uri rdf:type owl:Class .
?efo_uri ?mapping_property_uri ?xref_uri


BIND( REPLACE( STR(?efo_uri), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
BIND( REPLACE( STR(?mapping_property_uri), "^http://purl\\.obolibrary\\.org/obo/mondo#(.+)$", "mondo:$1" ) AS ?mapping_property_id )
BIND( REPLACE( STR(?mapping_property_id), "^http://www\\.w3\\.org/2004/02/skos/core#(.+)$", "skos:$1" ) AS ?mapping_property_id )
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reply to this comment with the head of the output table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORDER BY ?efo_id ?xref_id ?mapping_property_id
18 changes: 18 additions & 0 deletions nxontology_data/efo/queries/xref_sources.rq
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?efo_id ?xref ?axiom_source
WHERE {
?axiom rdf:type owl:Axiom.
?axiom owl:annotatedSource ?source.
?axiom owl:annotatedProperty oboInOwl:hasDbXref.
?axiom owl:annotatedTarget ?xref.

OPTIONAL { ?axiom oboInOwl:source ?axiom_source }.

BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id )
}

GROUP BY ?efo_id ?xref ?axiom_source
ORDER BY ?efo_id ?xref ?axiom_source