Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty index inconsistency #3830

Open
ego-thales opened this issue Sep 4, 2024 · 1 comment
Open

Empty index inconsistency #3830

ego-thales opened this issue Sep 4, 2024 · 1 comment

Comments

@ego-thales
Copy link

Hello,

When searching more neighbours than there are samples in the population, one expects -1 returned indexes and infinite (before nan_to_num) distances. I tested this on empty indexes and it works as expected with a few samples but not above a certain population size:

>>> faiss.IndexFlatL2(1).search(np.random.random((19, 1)), 1)  # Works fine
(array([[3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38],
       [3.4028235e+38]], dtype=float32), array([[-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1],
       [-1]]))
>>> faiss.IndexFlatL2(1).search(np.random.random((20, 1)), 1)  # First inconsistency at 20 samples
(array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]], dtype=float32), array([[4606307992086394560],
       [4604998639883125282],
       [4600655086560625350],
       [4598294196564123122],
       [4604232847489403516],
       [4606679014202959133],
       [4602634729945304042],
       [4606974893601702480],
       [4598449504068727218],
       [4599660145183339918],
       [4589041727904493456],
       [4586442628466231040],
       [4606335022048302453],
       [4597148622011556900],
       [4606505784769942931],
       [4592230883026519384],
       [4603841237871291188],
       [4601734269562976314],
       [4595473575588231508],
       [4606528106805786929]]))

With torch.Tensor inputs, it starts bugging at n_samples=21 and not 20.

May be related to #2135(?)

@mdouze
Copy link
Contributor

mdouze commented Sep 6, 2024

Presumably because the threshold on nb queries

https://github.com/facebookresearch/faiss/wiki/Implementation-notes#matrix-multiplication-to-do-many-l2-distance-computations

Probably the ids are not filled in in the BLAS case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants