Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support knn Queries in _msearch for Multi-Collection Vector Search #2570

Open
genandre opened this issue Feb 28, 2025 · 0 comments

Comments

@genandre
Copy link

Is your feature request related to a problem?

Currently, OpenSearch does not support knn queries inside _msearch, which prevents performing approximate nearest neighbor (ANN) search across multiple indices in a single request.

This limitation makes it difficult to efficiently retrieve vector search results from multiple collections, requiring multiple separate queries instead. This increases query complexity, latency, and response merging overhead in applications that need multi-collection vector search.


What solution would you like?

I would like OpenSearch to support knn queries inside _msearch, so users can send multiple knn searches across different indices in a single batch request, similar to how _msearch works for traditional queries.

This feature should allow:

  • Executing multiple knn searches in one request (just like _msearch does for match, term, and script_score queries).
  • Efficiently retrieving results from multiple indices without requiring separate requests.
  • Merging results from multiple indices with minimal performance overhead.

What alternatives have you considered?

Since knn is not currently supported in _msearch, the following workarounds have been explored:

  1. Sending separate KNN _search queries per index and merging results client-side.

    • 🚀 Fast, but requires extra logic in the application to merge responses manually.
    • Increases request overhead due to multiple network calls.
  2. Reindexing all collections into a single index and filtering with a type field.

    • Works well for some cases, but does not scale well if collections are large and frequently updated.
    • Requires additional storage and maintenance overhead.
  3. Using _msearch with script_score instead of knn for vector similarity.

    • Works in _msearch, but significantly slower than knn (brute-force computation instead of ANN indexing).

None of these alternatives fully solve the problem in an efficient and scalable way.


Do you have any additional context?

  • Elasticsearch also does not support knn in _msearch, and OpenSearch could introduce this as a unique advantage.
  • The ability to batch vector searches would be highly beneficial for multi-collection vector search use cases, such as:
    • Searching across multiple document repositories in a single request.
    • Performing multi-modal search (e.g., combining text, image, or embeddings from different sources).
    • Improving efficiency in real-time recommendation systems that rely on fast ANN lookups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant