Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ACORN-1 search for HNSW #14085

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benchaplin
Copy link
Contributor

@benchaplin benchaplin commented Dec 20, 2024

Description

Playing around with some ideas from ACORN-1 to improve filtered HNSW search. The ideas are:

  • Predicate subgraph traversal (only consider/score candidates that pass the filter)
  • Two-hop neighbor expansion (I read up on Weaviate's implementation and used their idea to consider two-hop neighbors only when the first hop doesn't pass the filter)

I benchmarked using Cohere/wikipedia-22-12-en-embeddings with params:

  • nDoc = 200000
  • topK = 100
  • fanout = 50
  • maxConn = 32
  • beamWidth = 100
  • filterSelectivity = [0.05, 0.25, 0.5, 0.75, 0.95]

Here are some results:

Baseline:

filterSelectivity recall latency (ms)
0.05 0.037 17.182
0.25 0.166 7.348
0.5 0.332 4.376
0.75 0.489 3.165
0.95 0.608 2.441

Candidate (this code):

filterSelectivity recall latency (ms)
0.05 0.028 2.744
0.25 0.157 4.614
0.5 0.308 4.833
0.75 0.449 4.622
0.95 0.563 3.244

Pros: significantly faster for selective filters.
Cons: slightly worse recall across the board, slightly slower for inclusive filters.

There's a lot to play around with here, this code represents the best results I got with this testing. One thing that must be tested is correlation between filter and query vector (this is discussed and tested in the paper). luceneutil only offers zero correlation at the moment, so I'm working on adding a knob to turn there for future benchmarks.

Code should also be cleaned up, but for me, keeping everything in one method makes it easier to read the changes.

@benwtrent
Copy link
Member

Thank you for taking a stab at this @benchaplin ! I wonder if we can adjust the algorithm to more intelligently switch between the algorithms. something like:

  • Fan out one layer (only accepting the filtered docs) add candidates.
  • If we get an acceptable "saturation" (e.g. some number <= m*2 that we consider adequate connectedness), we just stick to those candidates and explore.
  • If we do not reach appropriate saturation fan out second layer add candidates.
  • If we fail saturation (again, some number <= m*2), do we fan out a third layer? Do we "jump up" a layer in the graph to gather a better entry point as the current one is garbage?

The initial algorithm makes sense, we are trying to recover the graph connectedness for exploration. The bottom layer entry point is the initial "exploration zone". One idea also is that we allow multiple "exploration zones" from which we fan out finding the filtered values.

These are just 🧠 ⚡ ideas. The initial numbers are promising.

@benwtrent
Copy link
Member

Hey @benchaplin there are a number of things broken with lucene util right now. Your recall numbers surprised me and I think they don't reflect actual performance.

I am working on getting real numbers with some local patches & providing patches to Lucene util as I can.

see:

@benwtrent
Copy link
Member

This is the branch I am using for testing recall/latency for filter cases for right now: https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:filter-testing?expand=1

@benwtrent
Copy link
Member

Here are some benchmarks (100k float32[1024]).

Baseline:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.915         0.950  100000   100       0       16        100     2054         0.95
 0.918         0.950  100000   100       0       16        100     2128         0.90
 0.924         1.090  100000   100       0       16        100     2417         0.75
 0.935         1.430  100000   100       0       16        100     3357         0.50
 0.962         2.740  100000   100       0       16        100     5846         0.25
 1.000         9.530  100000   100       0       16        100     9882         0.10
 1.000         4.750  100000   100       0       16        100     4913         0.05
 1.000         2.100  100000   100       0       16        100     2507         0.03
 1.000         0.970  100000   100       0       16        100     1023         0.01
 0.975         1.660  100000   100     100       16        100     3545         0.95
 0.977         1.680  100000   100     100       16        100     3682         0.90
 0.977         2.120  100000   100     100       16        100     4218         0.75
 0.984         2.720  100000   100     100       16        100     5789         0.50
 0.990         5.190  100000   100     100       16        100     9889         0.25
 1.000         9.900  100000   100     100       16        100     9883         0.10
 1.000         4.740  100000   100     100       16        100     4913         0.05
 1.000         2.150  100000   100     100       16        100     2507         0.03
 1.000         0.970  100000   100     100       16        100     1023         0.01

Candidate:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.852         1.330  100000   100       0       16        100     2723         0.95
 0.821         1.510  100000   100       0       16        100     3011         0.90
 0.793         1.680  100000   100       0       16        100     3329         0.75
 0.795         1.730  100000   100       0       16        100     3357         0.50
 0.880         1.820  100000   100       0       16        100     2912         0.25
 0.891         1.430  100000   100       0       16        100     1692         0.10
 0.823         1.320  100000   100       0       16        100     1070         0.05
 0.765         1.170  100000   100       0       16        100      620         0.03
 0.630         1.190  100000   100       0       16        100      341         0.01
 0.942         2.210  100000   100     100       16        100     4786         0.95
 0.934         2.500  100000   100     100       16        100     5360         0.90
 0.927         2.950  100000   100     100       16        100     5983         0.75
 0.925         3.370  100000   100     100       16        100     6005         0.50
 0.956         3.160  100000   100     100       16        100     4711         0.25
 0.947         2.380  100000   100     100       16        100     2560         0.10
 0.895         2.160  100000   100     100       16        100     1569         0.05
 0.842         1.910  100000   100     100       16        100      888         0.03
 0.744         1.940  100000   100     100       16        100      482         0.01

You can see until about 50% selectivity, latency & recall are worse in candidate. However, was we select even fewer than 50%, visited gets better, but recall suffers.

This is likely because in the more restrictive cases we are actually dropping to brute-force because of excessive exploration (note the 1.0 recall in baseline).

@benwtrent
Copy link
Member

https://github.com/apache/lucene/compare/main...benwtrent:lucene:acorn_search?expand=1

Here are two of my ideas:

  • We only go to 2-hop if a percentage of the current candidate's neighbors are filtered out
  • We oversample by a percentage the total candidates considered.

Tweaking the settings might take some steps.

I also think there are things to do around:

  • Considering jumping up the layer and going to a different entry point if we get "far away" from the current entry point
  • More intelligently choosing the candidates for two-hop
  • Consider three-hop on very restrictive filters (e.g. we don't satiate our expanded set, we should look one layer more out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants