-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow endpoint to support multiple matching algorithm(s) #115
Comments
I don’t think that requiring a service to support a specific ranking algorithm is useful at all, and presents a barrier to entry. One the other hand, I think it is a good idea to include the name of the ranking algorithm and the score, either normalized, for example from 1000-1, or a raw value and including the max value in the results. The client can then decide how to handle things on their own. I say a barrier because it will be hard to come with a ranking algorithm that will be useful given the variations in the various services. Either you come up with something really complicated to cover all variations (and people will balk), or you come up with something really simplistic that it will be useless. Finally, we want people to experiment and try out new things, mandating things may cause the opposite effect. For example GeneMatcher has just tried out 4 different algorithms, we probably would not have done that if there was a mandated one. The usefulness I mentioned is because I have a lot of experience dealing with ranking algorithms in my Information Retrieval past. If the goal is to merge the results from various services into a single list for the user, I can tell you from experience that there be monsters down that route, unless you have (very) homogenous data sources. If I have two document collections (news items for example), the maths to say that a search scores 0.5 for a document in one, and 0.7 for a document in another and say that the latter is more relevant than the former is well understood. But for highly structured data it is much more complicated, how do I compare a record from one MME with a score of 0.5 for the gene section and another from another MME with a score of 0.7 on the phenotype section. Its edge cases all the way down. |
I agree with @fschiettecatte's assessment. Saying what algoritm is used is fine. Mandating could cause a barrier to entry. |
Okay, I updated the proposal to completely drop the compliance aspect and any mention of required baseline algorithms. I agree those agendas are not consistent with the spirit of the MME and would cause problems down the road. Instead, we just enable each service to support multiple algorithms by:
|
I think this best alighns with 2.0 based on the discussions we've had this year about modualrity and 2.0 plans. |
Currently, each MME service implements their own pheno/genotypic scoring metric. This is great. Part of the goal of the project is to have these metrics compete and evolve. We have also discussed allowing the querier to specify the form of the matching (or bias it in certain hypothesis-driven ways). One such parameter is which scoring algorithm(s) are used to determine the best matches.
I propose:
This provides several benefits as I see it:
Important: It provides the ability to actually automatically measure the compliance of new MME API endpoints. Specifically, if we decide that some simple-to-implement baseline measures are mandatory to implement (such as the UI score for phenotypic similarity), we can automatically verify that new endpoints return the correct scores on the test data. This is important if we want: 1) a reference implementation, 2) a compliance test, 3) to be part of the main MME API.The text was updated successfully, but these errors were encountered: