Handling of ontology version mismatches #51

buske · 2015-01-21T04:55:36Z

Discussed and approved at the Miami meeting day 1.

Relequestual · 2015-04-22T13:00:46Z

HPO do provide stable monthly releases... well, possibly stable... http://compbio.charite.de/hudson/job/hpo.annotations.monthly/

fschiettecatte · 2015-04-22T23:07:30Z

Doesn't look very monthly to me, more like when they get to it. What I was trying to push for in Miami was some sort of disciplined version and release process (for example the UMLS releases in the spring and in the fall, with AA and AB releases for the year of release.) The issue is that in the space of about 18 months after PhenoDB was mapped to HPO, there were about 20 changes the HPO IDs we linked to (about 1800 I think). Some were easy, the number was folded into another numbers, others were nastier where the definition changed. When I talked to Peter about this in Dublin he did not seem to grasp the issue he was creating.

I suppose we could use the build number as the version. But how do I reconcile across builds? Say I am using build 83 and I get a request in build 79? Or the reverse?

buske · 2015-04-23T16:50:13Z

I would propose:

transmitting the HPO version number for provenance (and using the last stable build, or whatever)
- this could either be per-request, or per-feature
assuming for the time being that definitions remain static across versions
ignoring terms that are deprecated or not found (and conveying this in the response, e.g. by adding a list of ignored terms, or somehow a per-term message list that we could also use to communicate per-feature scoring)

This is obviously not completely "correct", and can present challenges. However, I don't see any way we can reasonably reconcile, and I don't think we should even try. The number of terms affected are small, the changes are likely small, and we really can't do any better.

For example, a patient might be classified as having microcephaly. If the term definition changes from being < -3SD to <-4SD, there's no way we can properly reconcile this across builds because we don't know the actual measurement. If a new term is created instead, we lose all of the historical information about the term and its frequency. There's no right solution here, but I think the one of least resistance and least "badness" is to actually assume the definitions are close enough that we map any terms we can on any HPO version we have.

buske · 2015-04-23T18:54:24Z

Alternatively, if it's obsoleted, we could bump it up the ontology until you find a non-obsolete term. :)

Relequestual · 2015-04-27T10:51:45Z

"Alternatively, if it's obsoleted, we could bump it up the ontology until you find a non-obsolete term. :)"

I was thinking this could be a solution, however it would require the requester to do the bumping, as the reciever wouldn't know how to bump the term.

I met someone at the ga4gh uk meeting on Friday (24th April) who claimed to have solved this problem, and is looking at making the solution open source. I will open up further discussions with him on this to assess if it's a viable solution.

buske · 2015-04-27T16:44:00Z

Fair point. I was thinking that the node was flagged as obsoleted, but the links were still there. It turns out this is not the case. These obsolete terms always seem to be annotated with 'replaced_by', where it's clear what we should do, or 'consider', in which it isn't, and we may not be able to use the term for now, until we have a better solution.

Relequestual · 2015-04-28T10:06:53Z

Oh gosh... that's not especially helpful. It seems HPO is not really designed to be used programtically without human interaction... =/

A potential solution, I've considered but dislike, is that each system has an endpoint which returns ontologies supported, and what version. The system could then check if the version is older or newer, and get a copy of that version, digest it, and then where terms are obsoleted or not present in the searching system, the user could be asked to make adjustments. I don't really think this is a viable option, but it may spur some further thought.

I have emailed the person I met at the ga4gh UK day, so am waiting for him to return. Ontologies are feeling more and more like they cause as many problems as they try to solve.

fschiettecatte · 2015-04-28T16:47:55Z

GeneMatcher is in a strange place with this because we use PhenoDB IDs internally. There are 3,646 features, of which 2,857 are mapped to HPO. This mapping is done manually and is checked/updated every 12-18 months.

So when we get an MME request we have to map the feature HPO IDs to PhenoDB IDs.

To do that we take the HPO ID and walk up the HPO tree until we hit a mapping to PhenoDB. This is made a little tricky because HPO features can have multiple parents, so we pick the 'closest' one.

Obviously this mapping is done ahead of time for efficiency (one lookup to get the corresponding ICHPT ID, if any), and we update this mapping every week with a current copy of the hpo.obo file. There are going to be HPO IDs that cannot be resolved and we just drop them and we also map alternative/obsolete HPO IDs.

I am not sure if this helps with this issue, but this is a practical approach that offers a pretty strong guarantee that an MME request won't contain an HPO ID that we have not yet seen.

buske · 2015-04-28T19:02:43Z

It may be worth resurrecting the proposal of passing (mandatory?) ICHPT term ancestor(s) for each feature, once it's available.

fschiettecatte · 2015-04-28T20:06:23Z

I am not sure that helps, and I think just adds another layer of complexity to the protocol.

I described what GeneMatcher was doing as one possible approach which will accept any version of HPO and remap 'old' IDs to 'new' IDs. It does put the onus on GeneMatcher to stay current but that is not onerous. The one issue is that I can't do anything if Peter changes the definition for an ID, but when that happens everyone is screwed. As I said it is not perfect but it works without having to deal with HPO versions.

buske · 2015-04-28T20:27:08Z

Fair points, François. I think we should come up with a solution that avoids the chaos of trying to recreate older versions of the HPO. I think the only gap in the HPO right now is that only some obsolete terms have 'replaced_by' annotations. For those that don't, we'll have to ignore them, when in reality we could map them to more general terms that the obsolete term implies. This is probably the best we can do. I've sent an email to some Monarch people to see if they have any suggestions.

cmungall · 2015-04-28T21:09:07Z

replaced_by can be used to automatically update annotations. consider provides suggestions that a human curator can evaluate and decide if the suggested class is appropriate. This is the same system as used in the GO

Relequestual · 2015-04-29T08:31:40Z

I have some good news on this! I spoke to someone from the EBI yesterday. They are working on a fresh and updated version of the EBI Ontology Lookup Service (OLS). There will be a publically useable API, which will allow the looking up of terms.

Say for example there's a new term that I don't have in my version of HPO. I would be able to query the OLS, and get all the anscestor terms back. This would allow me to bump the term up.

If you don't want to rely on an API, tools are currently provided to allow you to create a solr or mongo instance containing an ontology. The tools can take owl files as input, and save directly to solr. I'll be investigaing this soon. Our plan would be to update the ontologies nightly.

Pushing the ontology data outside of our database would make it non reliant on our release cycle.

fschiettecatte · 2015-04-29T12:17:38Z

Taking a step back here, is this an issue we need to solve in MME or is this something we should leave up to individual sites to deal with, much like we are doing with feature matching and ranking? I think taking a similar approach would be a good option, namely publishing our different approaches here for transparency and pushing new members do likewise.

Relequestual · 2015-04-29T13:20:12Z

Agreed. I think we're no longer looking to solve this as a part of the MME API itself, but provide a documented solution that we can suggest or reccomend, but isn't mandatory. It's obvilusly an issue we, and others involved, should be aware of.

I suggest we rename this issue to something like "How could we handle ontology version missmatches?"

buske · 2015-04-29T18:49:53Z

I agree with both of you. We can mention the OLS as one possible solution for handling ontology version mismatches, but leave each site to handle it in their own way. From the MME API side, I think we just need to:

transmit the HPO version with the request/response
include in the response a way to represent what was done with each term, so the requestor knows which terms were dropped or converted, if any

Relequestual · 2015-04-30T08:37:01Z

I'm not sure point 1 is relevant any longer. I mean, even if the reciving system knows it has an older version, it won't know which items from the ontology it doesn't know about till it looks them up, making sending the version number redundant, doesn't it?

I can see your reasoning for point 2, but I'm not sure how this could be displayed to the end user. Any thoughts?

fschiettecatte · 2015-04-30T13:42:00Z

Re point 2, this expands to the other fields in the match request, and how represent what matched, how that match was done including remapping, etc... This turns into a bit of a rat's nest of complexity. I am not sure we want to address that.

Relequestual · 2015-05-01T08:59:26Z

I guess it's possible that there could be "additional notes" which have a human readable text which explains any transcoding or changes made to the incoming request data. I think we should push this to the backburner though, once we can work with different ontology versions OK.

Relequestual · 2016-08-04T10:40:01Z

It could be useful to create a summary of this issue and add that to the first or second post.

buske added add field Approved enhancement labels Jan 21, 2015

allasm added this to the v1.0 milestone Jan 28, 2015

buske modified the milestones: v1.0, v1.1 Jan 30, 2015

buske changed the title ~~Record HPO version in request and response metadata~~ Handling of ontology version mismatches Apr 29, 2015

Relequestual added Review required and removed Approved labels Feb 23, 2016

Relequestual mentioned this issue Feb 23, 2016

Figure out how to handle HPO versioning/new terms #52

Closed

Relequestual removed this from the v1.1 milestone Aug 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of ontology version mismatches #51

Handling of ontology version mismatches #51

buske commented Jan 21, 2015

Relequestual commented Apr 22, 2015

fschiettecatte commented Apr 22, 2015

buske commented Apr 23, 2015

buske commented Apr 23, 2015

Relequestual commented Apr 27, 2015

buske commented Apr 27, 2015

Relequestual commented Apr 28, 2015

fschiettecatte commented Apr 28, 2015

buske commented Apr 28, 2015

fschiettecatte commented Apr 28, 2015

buske commented Apr 28, 2015

cmungall commented Apr 28, 2015

Relequestual commented Apr 29, 2015

fschiettecatte commented Apr 29, 2015

Relequestual commented Apr 29, 2015

buske commented Apr 29, 2015

Relequestual commented Apr 30, 2015

fschiettecatte commented Apr 30, 2015

Relequestual commented May 1, 2015

Relequestual commented Aug 4, 2016

Handling of ontology version mismatches #51

Handling of ontology version mismatches #51

Comments

buske commented Jan 21, 2015

Relequestual commented Apr 22, 2015

fschiettecatte commented Apr 22, 2015

buske commented Apr 23, 2015

buske commented Apr 23, 2015

Relequestual commented Apr 27, 2015

buske commented Apr 27, 2015

Relequestual commented Apr 28, 2015

fschiettecatte commented Apr 28, 2015

buske commented Apr 28, 2015

fschiettecatte commented Apr 28, 2015

buske commented Apr 28, 2015

cmungall commented Apr 28, 2015

Relequestual commented Apr 29, 2015

fschiettecatte commented Apr 29, 2015

Relequestual commented Apr 29, 2015

buske commented Apr 29, 2015

Relequestual commented Apr 30, 2015

fschiettecatte commented Apr 30, 2015

Relequestual commented May 1, 2015

Relequestual commented Aug 4, 2016