You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 20, 2020. It is now read-only.
For all of my demo/experiments I did a random sample, somewhat biased to the first vectors. For example, for the benchmarks I have a parameter that defines the probability of selecting a vector to be a sample while iterating over the glove vectors. So if you set this to 0.3, it would pick roughly 3 of every 10 vectors to be a sample vector, until it's picked enough for the sample.
For what it's worth, I never saw a meaningful difference in different sampling strategies or sample seeds. There would however be a meaningful difference if you start populating many vectors which are very different from your sampled vectors used to build the LSH model. e.g. if you exclusively sample vectors from images of cats and then start populating vectors for images of fine artwork or some other completely different domain.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I see that there initialized from twitter images in the demo. But in scratch, I see the use of random projections: https://github.com/alexklibisz/elastik-nearest-neighbors/blob/025f7291512a5f5d56fd34d1cf1c10efc7c83df7/scratch/es-lsh-glove/glove_lsh_es_index.py.
How can I learn further?
The text was updated successfully, but these errors were encountered: