Explain how sample vectors should be selected in README #2

henrywallace · 2018-06-20T03:10:55Z

I see that there initialized from twitter images in the demo. But in scratch, I see the use of random projections: https://github.com/alexklibisz/elastik-nearest-neighbors/blob/025f7291512a5f5d56fd34d1cf1c10efc7c83df7/scratch/es-lsh-glove/glove_lsh_es_index.py.

How can I learn further?

henrywallace · 2018-06-20T03:15:09Z

Upon further inspection, it looks like there's some good explanatory bits in https://github.com/alexklibisz/elastik-nearest-neighbors/blob/025f7291512a5f5d56fd34d1cf1c10efc7c83df7/scratch/lsh-experiments/lsh-explore.ipynb.

alexklibisz · 2018-06-20T17:19:03Z

Hi @henrywallace

For all of my demo/experiments I did a random sample, somewhat biased to the first vectors. For example, for the benchmarks I have a parameter that defines the probability of selecting a vector to be a sample while iterating over the glove vectors. So if you set this to 0.3, it would pick roughly 3 of every 10 vectors to be a sample vector, until it's picked enough for the sample.

alexklibisz · 2018-06-20T20:09:23Z

In general the best place to see full usage of the plugin in its current state is in the benchmarking script here: https://github.com/alexklibisz/elastik-nearest-neighbors/blob/master/elasticsearch-aknn/benchmark/aknn.py

It's using Glove vectors, which are just text files: https://nlp.stanford.edu/projects/glove/

Here is where the sampling happens: https://github.com/alexklibisz/elastik-nearest-neighbors/blob/master/elasticsearch-aknn/benchmark/aknn.py#L79-L84

For what it's worth, I never saw a meaningful difference in different sampling strategies or sample seeds. There would however be a meaningful difference if you start populating many vectors which are very different from your sampled vectors used to build the LSH model. e.g. if you exclusively sample vectors from images of cats and then start populating vectors for images of fine artwork or some other completely different domain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain how sample vectors should be selected in README #2

Explain how sample vectors should be selected in README #2

henrywallace commented Jun 20, 2018

henrywallace commented Jun 20, 2018

alexklibisz commented Jun 20, 2018

alexklibisz commented Jun 20, 2018

Explain how sample vectors should be selected in README #2

Explain how sample vectors should be selected in README #2

Comments

henrywallace commented Jun 20, 2018

henrywallace commented Jun 20, 2018

alexklibisz commented Jun 20, 2018

alexklibisz commented Jun 20, 2018