Replies: 6 comments 11 replies
-
I agree that we should not treat our trusted analysts as babies.
One way to present this is as the exact error. Another is to present it as a noise magnitude (as we can do now with Insights). So the alternative would be to export the noise magnitude rather than the actual error. The noise magnitude could be exported as the standard deviation as we have been doing, or as the 95% probability upper and lower bound, which might be more intuitive for some. This is something that would be safe to pass on downstream to an untrusted party who needs to do further analysis and needs these noise magnitude numbers. Even if we choose to export the exact error, literally exporting the combined view might also mean exporting suppressed values. The analyst may wish not to do this, which then leads to a design decision: do we simply not report suppressed rows, or do we give the analyst the choice to do so? Keep in mind as well that we are presenting some information (number of suppressed rows, average and max distortion), which could as well be made available for export (though not neatly in a csv per se). I'm sympathetic to your opinion here. However I think we should wait until we get some feedback from our users on this feature. |
Beta Was this translation helpful? Give feedback.
-
The reason we aren't at the moment presenting noise magnitude is because in the combined view we present actual noise, so we don't need magnitude. The current purpose, in my mind, is so that the analyst can make a simple decision as to whether the distortion is acceptable or not, and adjust generalization / column selection in response. The need for downstream understanding of released data is a different use case, so to speak, and not what our current design really addresses. So it may be the case that simply releasing the combined view is not helpful for this (especially if the downstream analyst shouldn't have access to the exact numbers). Note, by the way, that as long as we are only doing counts, the downstream analyst knows exactly how much noise was added to each row, because it is the same for every row. Supplying a noise magnitude number is only useful when the amount of noise is different for different rows. (And indeed an analyst estimating the everything, say, count 10 is bad would be the right thing to do...) In other words, we have a little time before this becomes a critical issue, and indeed it would be good to get feedback on this to see what our users will want. |
Beta Was this translation helpful? Give feedback.
-
Ha ha, I just realized a good use for the export of the combined view. It is for the trusted analyst to "save their work". In other words, the analyst might run a query that takes two minutes, and might later come back to the same query, but doesn't want to wait two minutes again, and also doesn't want to go through the whole process of making another notebook, loading up the same csv file, etc. Since we no "save notebook" or caching, the analyst may prefer just to save off the combined view to peruse the result. In other words, it is not for a downstream analyst (for which I don't think we fully understand the needs), but just a time-saving device for the primary analyst... |
Beta Was this translation helpful? Give feedback.
-
Well all I'm saying here is that we don't currently have support for that "continuing work" user story at all, so saving the combined view might be a quick-and-dirty hack to partially support it. But only partially. Which to me still argues for not doing it in the initial version. I'd rather support the continuing work story properly (which will be some effort for sure) if we really need it, and likewise support the downstream analyst properly once we better understand what that means. |
Beta Was this translation helpful? Give feedback.
-
Yes certainly there will be post processing. I think we should avoid providing any support in the tool that can otherwise be done easily outside the tool. The tool should support generating anonymous data and metadata and little else. Exporting the combined view is a nice helper for the analyst. The same information could be derived outside our tool, but not easily. So we should see if there is demand for it. Exporting the noise amounts is something that cannot be derived outside our tool (at least, not after we go beyond counting users). |
Beta Was this translation helpful? Give feedback.
-
Yes, I meant the later (the noise SD) |
Beta Was this translation helpful? Give feedback.
-
I have a pretty strong opinion that we should allow the user to export the "Combined View", and not only the anonymised data.
Exporting the combined view is the only way the analyst can currently export information about the data quality in each of the rows. The .csv they get out isn't the end of their workflow – they have to get this data into somewhere and present or analyse it further. For this, knowing that a row has a 30% error is vital. Also if they want to present an analysis of the tool itself, they might want to get real and anonymised data out.
As far as I can tell, the only reason we don't want them to is because we don't trust them to understand that the combined view contains personal information (or allows to infer them). I think we should just make a big fat warning window when they export the combined view, and be done with it. Our users aren't babies, and keeping a very useful function from them seems wrong.
@yoid2000 mostly for you.
Beta Was this translation helpful? Give feedback.
All reactions