Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Hybrid Search should provide scores of sub queries for understanding/debugging the results. #658

Open
vamshin opened this issue Mar 31, 2024 · 5 comments

Comments

@vamshin
Copy link
Member

vamshin commented Mar 31, 2024

Is your feature request related to a problem?

Hybrid search doesn't return the scores of each individual query, making it difficult to debug why fragments were included/excluded

What solution would you like?

As part of _explain API, we should provide scores of sub queries for understanding/debugging the results.

@vamshin vamshin moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Mar 31, 2024
@vamshin vamshin moved this from Backlog (Hot) to 2.15.0 in Vector Search RoadMap Apr 1, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in Test roadmap format Apr 9, 2024
@vamshin vamshin removed the v2.15.0 label May 31, 2024
@vamshin vamshin moved this from 2.15.0 to 2.16.0 in Vector Search RoadMap May 31, 2024
@smacrakis
Copy link

Yes, customers would like to see both scores from hybrid search, both for debugging and for training LTR models.

@smacrakis
Copy link

We were also hoping for this feature in 2.16 for our own work (with OSC) in tuning hybrid search using LTR.

@yuye-aws
Copy link
Member

Are we also going to support explain API for KNN queries like: opensearch-project/k-NN#875?

@smacrakis
Copy link

The documentation on _explain says "The explain API is an expensive operation in terms of both resources and time. On production clusters, we recommend using it sparingly for the purpose of troubleshooting."
If this is true, then returning the subquery scores via _explain is not going to be viable for LTR in production if the subquery scores are being used as features. Do we have a path to returning the scores more efficiently?

@zhichao-aws
Copy link
Member

The documentation on _explain says "The explain API is an expensive operation in terms of both resources and time. On production clusters, we recommend using it sparingly for the purpose of troubleshooting." If this is true, then returning the subquery scores via _explain is not going to be viable for LTR in production if the subquery scores are being used as features. Do we have a path to returning the scores more efficiently?

I have the same question.
Customers may need the absolute scores as input features for downstream systems. While current hybrid query just normalize the scores and we lose that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Now(This Quarter)
Development

No branches or pull requests

6 participants