Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distance option to get 2nd and 3rd (or kth) nearest record #23

Open
MaxGhenis opened this issue Mar 1, 2019 · 3 comments
Open

Add distance option to get 2nd and 3rd (or kth) nearest record #23

MaxGhenis opened this issue Mar 1, 2019 · 3 comments

Comments

@MaxGhenis
Copy link
Collaborator

MaxGhenis commented Mar 1, 2019

Rather than only the current minimum.

From https://www.irs.gov/pub/irs-soi/07rppsweber.pdf see

With the distance-based algorithm, protection against reidentification is measured in terms of the number of PUF records that lie at least as close to a record from the population as the true match. The minimum protection that is sought is having at least two records that are at least as close to a record from the population as the true match, if the true match is in the PUF.

@MaxGhenis
Copy link
Collaborator Author

np.argpartition can do this: https://stackoverflow.com/a/34226816/1840471

@MaxGhenis
Copy link
Collaborator Author

Largely added but the distances aren't coming out in the right order:

print(nearest[nearest.dist1 > nearest.dist3].shape[0])  # 129
print(nearest[nearest.dist1 < nearest.dist3].shape[0])  # 756

One of these should be zero.

@MaxGhenis
Copy link
Collaborator Author

From numpy.argpartition documentation:

Element index to partition by. The k-th element will be in its final sorted position and all smaller elements will be moved before it and all larger elements behind it. The order all elements in the partitions is undefined. If provided with a sequence of k-th it will partition all of them into their sorted position at once.

So it needs to be re-sorted either within nearest_record_single or at the end (probably faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant