Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization vs Randomization #18

Open
robemery opened this issue Jan 13, 2021 · 2 comments
Open

Generalization vs Randomization #18

robemery opened this issue Jan 13, 2021 · 2 comments

Comments

@robemery
Copy link

Hi,

This is the first time I've used GitHub to comment so forgive me if I've done something wrong.

The GeoReference Guides are fantastic tools. I have had to deal with so much messy data it is really useful to have guides to refer people to.

I was surprised that randomisation of sensitive locations was vehemently opposed.

Previously I have collected a lot of biosecurity reports from citizen scientists. We felt obliged to obfuscate locations because sometimes people included photographs of their homes.

The problem with generalisation by removing decimal places is that many of the dots end up on top of each other so there is no indication of how much data had been contributed. Google Maps has that point expansion/rose effect which works nicely but that's no help in a static PDF or printout.

Anyway, thanks for the excellent work.

Regards,

Rob

@ArthurChapman
Copy link
Collaborator

Thanks Rob for your comment

There are always advantages and disadvantages for whatever method one chooses for obfuscation. After the various workshops, an online survey and open forums it was agreed overwhelmingly that randomisatioon introduced far more problems than did generalisation.

An advantage of generalization (where as you say all the points overlap) it does indicate generalization and that these may not be the actual location - or at least they are within the range of coordinate_Uncertainty and precision. You are not changing the data - just reporting it at a smaller scale. Randomisation is changing the data and without extra information being made available, it is misleading as the interpretation is that this randomised location is the true location. Of course it all boils down to fitness for use and what the user wants to do with the data.

Whatever method it is important that the original data be retained - any obfuscation should only be for publication of the data, and documenting what has been done is essential.

Thanks again for your comments. They are valuable.

Arthur

@robemery
Copy link
Author

robemery commented Jan 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants