Generalization vs Randomization #18

robemery · 2021-01-13T11:16:30Z

Hi,

This is the first time I've used GitHub to comment so forgive me if I've done something wrong.

The GeoReference Guides are fantastic tools. I have had to deal with so much messy data it is really useful to have guides to refer people to.

I was surprised that randomisation of sensitive locations was vehemently opposed.

Previously I have collected a lot of biosecurity reports from citizen scientists. We felt obliged to obfuscate locations because sometimes people included photographs of their homes.

The problem with generalisation by removing decimal places is that many of the dots end up on top of each other so there is no indication of how much data had been contributed. Google Maps has that point expansion/rose effect which works nicely but that's no help in a static PDF or printout.

Anyway, thanks for the excellent work.

Regards,

Rob

ArthurChapman · 2021-01-13T21:00:13Z

Thanks Rob for your comment

There are always advantages and disadvantages for whatever method one chooses for obfuscation. After the various workshops, an online survey and open forums it was agreed overwhelmingly that randomisatioon introduced far more problems than did generalisation.

An advantage of generalization (where as you say all the points overlap) it does indicate generalization and that these may not be the actual location - or at least they are within the range of coordinate_Uncertainty and precision. You are not changing the data - just reporting it at a smaller scale. Randomisation is changing the data and without extra information being made available, it is misleading as the interpretation is that this randomised location is the true location. Of course it all boils down to fitness for use and what the user wants to do with the data.

Whatever method it is important that the original data be retained - any obfuscation should only be for publication of the data, and documenting what has been done is essential.

Thanks again for your comments. They are valuable.

Arthur

robemery · 2021-01-14T05:51:51Z

Thanks for getting back to me Athur. This has been an area of interest to me for many years, even before GPS when we digitized our insect reference collection hand-written labels. More recently we have used smartphone apps to record research data as well as community engagement. I attended one of Debra's workshops in Santa Barbara a few years ago and I don't know how many times I have handed out copies of your guide to people pleading with them to collect georeference data properly. I thought you might be interested to see how we used the MyPestGuide reporting app to do a quick survey to demonstrate freedom from a citrus disease. https://www.google.com/maps/d/edit?mid=1cUsT5B3cv8cs6Me0ClwMUB1njUK9pE5i&usp=sharing We hold the original coordinates in our database, but sometimes obfuscate only as the points are sent to a map, especially if we are dealing with sensitive species or pest insects and weeds on private properties. We try to use large placemarks with blurred edges to reinforce that the point is not accurate. Google Maps resizes the points as you zoom in so by moving a point to protect privacy we end up putting it on someone else's property, which is even worse! Regards, Rob * *

…

On Thu, Jan 14, 2021 at 5:00 AM Arthur Chapman ***@***.***> wrote: Thanks Rob for your comment There are always advantages and disadvantages for whatever method one chooses for obfuscation. After the various workshops, an online survey and open forums it was agreed overwhelmingly that randomisatioon introduced far more problems than did generalisation. An advantage of generalization (where as you say all the points overlap) it does indicate generalization and that these may not be the actual location - or at least they are within the range of coordinate_Uncertainty and precision. You are not changing the data - just reporting it at a smaller scale. Randomisation is changing the data and without extra information being made available, it is misleading as the interpretation is that this randomised location is the true location. Of course it all boils down to fitness for use and what the user wants to do with the data. Whatever method it is important that the original data be retained - any obfuscation should only be for publication of the data, and documenting what has been done is essential. Thanks again for your comments. They are valuable. Arthur — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASONT7IEVGDDHPVBF3UAJT3SZYCWZANCNFSM4WAUR4AA> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalization vs Randomization #18

Generalization vs Randomization #18

robemery commented Jan 13, 2021

ArthurChapman commented Jan 13, 2021

robemery commented Jan 14, 2021 via email •

edited

Loading

Generalization vs Randomization #18

Generalization vs Randomization #18

Comments

robemery commented Jan 13, 2021

ArthurChapman commented Jan 13, 2021

robemery commented Jan 14, 2021 via email • edited Loading

robemery commented Jan 14, 2021 via email •

edited

Loading