Allow endpoint to support multiple matching algorithm(s) #115

buske · 2015-08-04T04:21:29Z

Currently, each MME service implements their own pheno/genotypic scoring metric. This is great. Part of the goal of the project is to have these metrics compete and evolve. We have also discussed allowing the querier to specify the form of the matching (or bias it in certain hypothesis-driven ways). One such parameter is which scoring algorithm(s) are used to determine the best matches.

I propose:

adding metadata to the request to specify the desired scoring metric(s) by name (e.g. "simGIC")
adding metadata to the response to specify the scoring metric(s) used to compute the score
having each service report (via some mechanism yet to be specified) the set of supported metrics, which may include custom/proprietary scoring metrics (e.g. "simPhenomeCentralv1")

This provides several benefits as I see it:

It gives each matchmaker the flexibility of supporting multiple scoring systems simultaneously
It gives end user or end system more control
It increases the transparency of what algorithms were used to score the matches
Important: It provides the ability to actually automatically measure the compliance of new MME API endpoints. Specifically, if we decide that some simple-to-implement baseline measures are mandatory to implement (such as the UI score for phenotypic similarity), we can automatically verify that new endpoints return the correct scores on the test data. This is important if we want: 1) a reference implementation, 2) a compliance test, 3) to be part of the main MME API.

fschiettecatte · 2015-08-04T09:35:49Z

I don’t think that requiring a service to support a specific ranking algorithm is useful at all, and presents a barrier to entry. One the other hand, I think it is a good idea to include the name of the ranking algorithm and the score, either normalized, for example from 1000-1, or a raw value and including the max value in the results. The client can then decide how to handle things on their own.

I say a barrier because it will be hard to come with a ranking algorithm that will be useful given the variations in the various services. Either you come up with something really complicated to cover all variations (and people will balk), or you come up with something really simplistic that it will be useless. Finally, we want people to experiment and try out new things, mandating things may cause the opposite effect. For example GeneMatcher has just tried out 4 different algorithms, we probably would not have done that if there was a mandated one.

The usefulness I mentioned is because I have a lot of experience dealing with ranking algorithms in my Information Retrieval past. If the goal is to merge the results from various services into a single list for the user, I can tell you from experience that there be monsters down that route, unless you have (very) homogenous data sources. If I have two document collections (news items for example), the maths to say that a search scores 0.5 for a document in one, and 0.7 for a document in another and say that the latter is more relevant than the former is well understood. But for highly structured data it is much more complicated, how do I compare a record from one MME with a score of 0.5 for the gene section and another from another MME with a score of 0.7 on the phenotype section. Its edge cases all the way down.

Relequestual · 2015-08-04T10:22:34Z

I agree with @fschiettecatte's assessment. Saying what algoritm is used is fine. Mandating could cause a barrier to entry.

buske · 2015-08-18T20:21:46Z

Okay, I updated the proposal to completely drop the compliance aspect and any mention of required baseline algorithms. I agree those agendas are not consistent with the spirit of the MME and would cause problems down the road.

Instead, we just enable each service to support multiple algorithms by:

allowing each service to state the set of algorithms it support, their versions, and the default (e.g. in a /status endpoint)
allowing the query to specify a particular one of the supported algorithms

Relequestual · 2016-02-23T10:32:44Z

I think this best alighns with 2.0 based on the discussions we've had this year about modualrity and 2.0 plans.

buske added the enhancement label Aug 4, 2015

buske mentioned this issue Aug 4, 2015

Add endpoint for service status #116

Closed

buske changed the title ~~Allow specifying the matching algorithm(s) used to match~~ Allow endpoint to support multiple matching algorithm(s) Aug 18, 2015

Relequestual added this to the 2.0 alpha milestone Feb 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow endpoint to support multiple matching algorithm(s) #115

Allow endpoint to support multiple matching algorithm(s) #115

buske commented Aug 4, 2015

fschiettecatte commented Aug 4, 2015

Relequestual commented Aug 4, 2015

buske commented Aug 18, 2015

Relequestual commented Feb 23, 2016

Allow endpoint to support multiple matching algorithm(s) #115

Allow endpoint to support multiple matching algorithm(s) #115

Comments

buske commented Aug 4, 2015

fschiettecatte commented Aug 4, 2015

Relequestual commented Aug 4, 2015

buske commented Aug 18, 2015

Relequestual commented Feb 23, 2016