Implement ACORN-1 search for HNSW #14085

benchaplin · 2024-12-20T18:31:06Z

Description

Playing around with some ideas from ACORN-1 to improve filtered HNSW search. The ideas are:

Predicate subgraph traversal (only consider/score candidates that pass the filter)
Two-hop neighbor expansion (I read up on Weaviate's implementation and used their idea to consider two-hop neighbors only when the first hop doesn't pass the filter)

I benchmarked using Cohere/wikipedia-22-12-en-embeddings with params:

nDoc = 200000
topK = 100
fanout = 50
maxConn = 32
beamWidth = 100
filterSelectivity = [0.05, 0.25, 0.5, 0.75, 0.95]

Here are some results:

Baseline:

filterSelectivity	recall	latency (ms)
0.05	0.037	17.182
0.25	0.166	7.348
0.5	0.332	4.376
0.75	0.489	3.165
0.95	0.608	2.441

Candidate (this code):

filterSelectivity	recall	latency (ms)
0.05	0.028	2.744
0.25	0.157	4.614
0.5	0.308	4.833
0.75	0.449	4.622
0.95	0.563	3.244

Pros: significantly faster for selective filters.
Cons: slightly worse recall across the board, slightly slower for inclusive filters.

There's a lot to play around with here, this code represents the best results I got with this testing. One thing that must be tested is correlation between filter and query vector (this is discussed and tested in the paper). luceneutil only offers zero correlation at the moment, so I'm working on adding a knob to turn there for future benchmarks.

Code should also be cleaned up, but for me, keeping everything in one method makes it easier to read the changes.

benwtrent · 2025-01-06T21:23:11Z

Thank you for taking a stab at this @benchaplin ! I wonder if we can adjust the algorithm to more intelligently switch between the algorithms. something like:

Fan out one layer (only accepting the filtered docs) add candidates.
If we get an acceptable "saturation" (e.g. some number <= m*2 that we consider adequate connectedness), we just stick to those candidates and explore.
If we do not reach appropriate saturation fan out second layer add candidates.
If we fail saturation (again, some number <= m*2), do we fan out a third layer? Do we "jump up" a layer in the graph to gather a better entry point as the current one is garbage?

The initial algorithm makes sense, we are trying to recover the graph connectedness for exploration. The bottom layer entry point is the initial "exploration zone". One idea also is that we allow multiple "exploration zones" from which we fan out finding the filtered values.

These are just 🧠 ⚡ ideas. The initial numbers are promising.

benwtrent · 2025-01-08T17:28:37Z

Hey @benchaplin there are a number of things broken with lucene util right now. Your recall numbers surprised me and I think they don't reflect actual performance.

I am working on getting real numbers with some local patches & providing patches to Lucene util as I can.

see:

benwtrent · 2025-01-08T19:52:58Z

This is the branch I am using for testing recall/latency for filter cases for right now: https://github.com/mikemccand/luceneutil/compare/main...benwtrent:luceneutil:filter-testing?expand=1

benwtrent · 2025-01-08T19:58:12Z

Here are some benchmarks (100k float32[1024]).

Baseline:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.915         0.950  100000   100       0       16        100     2054         0.95
 0.918         0.950  100000   100       0       16        100     2128         0.90
 0.924         1.090  100000   100       0       16        100     2417         0.75
 0.935         1.430  100000   100       0       16        100     3357         0.50
 0.962         2.740  100000   100       0       16        100     5846         0.25
 1.000         9.530  100000   100       0       16        100     9882         0.10
 1.000         4.750  100000   100       0       16        100     4913         0.05
 1.000         2.100  100000   100       0       16        100     2507         0.03
 1.000         0.970  100000   100       0       16        100     1023         0.01
 0.975         1.660  100000   100     100       16        100     3545         0.95
 0.977         1.680  100000   100     100       16        100     3682         0.90
 0.977         2.120  100000   100     100       16        100     4218         0.75
 0.984         2.720  100000   100     100       16        100     5789         0.50
 0.990         5.190  100000   100     100       16        100     9889         0.25
 1.000         9.900  100000   100     100       16        100     9883         0.10
 1.000         4.740  100000   100     100       16        100     4913         0.05
 1.000         2.150  100000   100     100       16        100     2507         0.03
 1.000         0.970  100000   100     100       16        100     1023         0.01

Candidate:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  visited  selectivity
 0.852         1.330  100000   100       0       16        100     2723         0.95
 0.821         1.510  100000   100       0       16        100     3011         0.90
 0.793         1.680  100000   100       0       16        100     3329         0.75
 0.795         1.730  100000   100       0       16        100     3357         0.50
 0.880         1.820  100000   100       0       16        100     2912         0.25
 0.891         1.430  100000   100       0       16        100     1692         0.10
 0.823         1.320  100000   100       0       16        100     1070         0.05
 0.765         1.170  100000   100       0       16        100      620         0.03
 0.630         1.190  100000   100       0       16        100      341         0.01
 0.942         2.210  100000   100     100       16        100     4786         0.95
 0.934         2.500  100000   100     100       16        100     5360         0.90
 0.927         2.950  100000   100     100       16        100     5983         0.75
 0.925         3.370  100000   100     100       16        100     6005         0.50
 0.956         3.160  100000   100     100       16        100     4711         0.25
 0.947         2.380  100000   100     100       16        100     2560         0.10
 0.895         2.160  100000   100     100       16        100     1569         0.05
 0.842         1.910  100000   100     100       16        100      888         0.03
 0.744         1.940  100000   100     100       16        100      482         0.01

You can see until about 50% selectivity, latency & recall are worse in candidate. However, was we select even fewer than 50%, visited gets better, but recall suffers.

This is likely because in the more restrictive cases we are actually dropping to brute-force because of excessive exploration (note the 1.0 recall in baseline).

benwtrent · 2025-01-08T21:19:34Z

https://github.com/apache/lucene/compare/main...benwtrent:lucene:acorn_search?expand=1

Here are two of my ideas:

We only go to 2-hop if a percentage of the current candidate's neighbors are filtered out
We oversample by a percentage the total candidates considered.

Tweaking the settings might take some steps.

I also think there are things to do around:

Considering jumping up the layer and going to a different entry point if we get "far away" from the current entry point
More intelligently choosing the candidates for two-hop
Consider three-hop on very restrictive filters (e.g. we don't satiate our expanded set, we should look one layer more out)

Implement ACORN-1 search for HNSW

046178b

benchaplin mentioned this pull request Dec 20, 2024

Look into ACORN-1, or another algorithm to aid in filtered HNSW search #13940

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ACORN-1 search for HNSW #14085

Implement ACORN-1 search for HNSW #14085

benchaplin commented Dec 20, 2024 •

edited

Loading

benwtrent commented Jan 6, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

Implement ACORN-1 search for HNSW #14085

Are you sure you want to change the base?

Implement ACORN-1 search for HNSW #14085

Conversation

benchaplin commented Dec 20, 2024 • edited Loading

Description

benwtrent commented Jan 6, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benwtrent commented Jan 8, 2025

benchaplin commented Dec 20, 2024 •

edited

Loading