Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split segment by search type #2273

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

VijayanB
Copy link
Member

Description

For exact search, it is not required to perform
qunatization during rescore with oversamples.
However, to avoid normalization between segments from
approx search and exact search, we will first identify
segments that needs approxsearch and will perform oversamples
and, at end, after rescore, we will add scores from segments that
will perform exact search.

Related Issues

#2215

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

We map order of results to order of segments, and finally
rely on that order to build top docs. Refactor method to use
map.Entry to map leafreader context with results from those
leaves.
This is required when we want to split segments based on
approx search or exact search to reduce rescoring twice
by exact search

Signed-off-by: Vijayan Balasubramanian <[email protected]>
if (isShardLevelRescoringEnabled == true) {
ResultUtil.reduceToTopK(perLeafResults, firstPassK);
}

StopWatch stopWatch = new StopWatch().start();
perLeafResults = doRescore(indexSearcher, leafReaderContexts, knnWeight, perLeafResults, finalK);
perLeafResults = doRescore(indexSearcher, knnWeight, perLeafResults, finalK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we did exact search on finalK. After this change, we still does exact search on finalK. Could you tell me how will this improve the latency?

Copy link
Member Author

@VijayanB VijayanB Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doSearch can call either ApproxSearch or Exact Search based on conditions like whether engine files exists or not, number of docs after filter is less than k. In those cases, we will quantize query vector, and every vector from segments, and, then perform distance computation using Hamming distance for firstPassK. With this approach, we only call doSearch for those segments which we know will always call approxsearch, and, for other segments we will call exact search without quantization with finalK. The optimization is at https://github.com/opensearch-project/k-NN/pull/2273/files#diff-9cfe412357ba56b3ef216427d491fc653535686a760e8ba19ea1aa00fc0e0338R68-R78

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming that an exact search on full precision vectors will be faster than an exact search with quantized vectors due to the slower quantization process? It would be interesting to see the benchmark results for this.

If that’s the case, an alternative could be to retrieve quantized values directly from the Faiss file instead of performing on-the-fly quantization.

Copy link
Member Author

@VijayanB VijayanB Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exact search on full precision for k is less than, exact search on quantization for first pass K + rescore matched docs on full precision . The linked GitHub issues actually shows how performance got impacted 10x when there are segments with no faiss engine files. In my POC, I saw improvements but recall was poor because of using order as link between results and leaf reader context. I am rerunning experiments with my change to collect metrics with latency and recall

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one case where we are running exact search; when the returned result is less than k. Are we going to handle that case as well?

For exact search, it is not required to perform
qunatization during rescore with oversamples.
However, to avoid normalization between segments from
approx search and exact search, we will first identify
segments that needs approxsearch and will perform oversamples
and, at end, after rescore, we will add scores from segments that
will perform exact search.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants