Should we allow the export of Easy Diffix' "combined view"? #130

fjab · 2021-08-26T14:16:34Z

fjab
Aug 26, 2021
Collaborator

I have a pretty strong opinion that we should allow the user to export the "Combined View", and not only the anonymised data.

Exporting the combined view is the only way the analyst can currently export information about the data quality in each of the rows. The .csv they get out isn't the end of their workflow – they have to get this data into somewhere and present or analyse it further. For this, knowing that a row has a 30% error is vital. Also if they want to present an analysis of the tool itself, they might want to get real and anonymised data out.

As far as I can tell, the only reason we don't want them to is because we don't trust them to understand that the combined view contains personal information (or allows to infer them). I think we should just make a big fat warning window when they export the combined view, and be done with it. Our users aren't babies, and keeping a very useful function from them seems wrong.

@yoid2000 mostly for you.

yoid2000 · 2021-08-26T15:00:06Z

yoid2000
Aug 26, 2021
Maintainer

I agree that we should not treat our trusted analysts as babies.

knowing that a row has a 30% error is vital

One way to present this is as the exact error. Another is to present it as a noise magnitude (as we can do now with Insights).

So the alternative would be to export the noise magnitude rather than the actual error. The noise magnitude could be exported as the standard deviation as we have been doing, or as the 95% probability upper and lower bound, which might be more intuitive for some. This is something that would be safe to pass on downstream to an untrusted party who needs to do further analysis and needs these noise magnitude numbers.

Even if we choose to export the exact error, literally exporting the combined view might also mean exporting suppressed values. The analyst may wish not to do this, which then leads to a design decision: do we simply not report suppressed rows, or do we give the analyst the choice to do so?

Keep in mind as well that we are presenting some information (number of suppressed rows, average and max distortion), which could as well be made available for export (though not neatly in a csv per se).

I'm sympathetic to your opinion here. However I think we should wait until we get some feedback from our users on this feature.

1 reply

fjab Aug 26, 2021
Collaborator Author

present it as a noise magnitude

Of course that would be preferable, and I assumed the reason for not having done that is that it's complicated(er).

Even if we choose to export the exact error, literally exporting the combined view might also mean exporting suppressed values. The analyst may wish not to do this, which then leads to a design decision: do we simply not report suppressed rows, or do we give the analyst the choice to do so?

I think the analyst can filter those out easily themselves afterwards.

Another option is that the analyst can choose to remove suppressed rows from the combined view (simple interface toggle).

I think we should wait until we get some feedback from our users on this feature.

Feedback is never wrong. But it really hit me before: Right now there is no way for the analyst to have any idea of the validity of a row. The best they can do is estimate that everything under, say, count 10 is bad and cut those off.

yoid2000 · 2021-08-26T15:36:12Z

yoid2000
Aug 26, 2021
Maintainer

The reason we aren't at the moment presenting noise magnitude is because in the combined view we present actual noise, so we don't need magnitude. The current purpose, in my mind, is so that the analyst can make a simple decision as to whether the distortion is acceptable or not, and adjust generalization / column selection in response.

The need for downstream understanding of released data is a different use case, so to speak, and not what our current design really addresses. So it may be the case that simply releasing the combined view is not helpful for this (especially if the downstream analyst shouldn't have access to the exact numbers).

Note, by the way, that as long as we are only doing counts, the downstream analyst knows exactly how much noise was added to each row, because it is the same for every row. Supplying a noise magnitude number is only useful when the amount of noise is different for different rows. (And indeed an analyst estimating the everything, say, count 10 is bad would be the right thing to do...)

In other words, we have a little time before this becomes a critical issue, and indeed it would be good to get feedback on this to see what our users will want.

1 reply

fjab Aug 27, 2021
Collaborator Author

Ok, in combination with the fact that we only support counts, it makes sense for the moment.

The current purpose, in my mind, is so that the analyst can make a simple decision as to whether the distortion is acceptable or not, and adjust generalization / column selection in response.

The problem I see is that distortion is hugely different for different rows (yes, not the absolute noise, but the relative one). But anyway, I think I made that point.

In any case, I still don't see why we can't just let users export the combined view...?

yoid2000 · 2021-08-27T13:06:18Z

yoid2000
Aug 27, 2021
Maintainer

Ha ha, I just realized a good use for the export of the combined view.

It is for the trusted analyst to "save their work". In other words, the analyst might run a query that takes two minutes, and might later come back to the same query, but doesn't want to wait two minutes again, and also doesn't want to go through the whole process of making another notebook, loading up the same csv file, etc. Since we no "save notebook" or caching, the analyst may prefer just to save off the combined view to peruse the result.

In other words, it is not for a downstream analyst (for which I don't think we fully understand the needs), but just a time-saving device for the primary analyst...

1 reply

fjab Aug 27, 2021
Collaborator Author

We have a user story for continuing work:

As Alice the Analyst, I want to be able to pick up where I left off last time, so that I can continue faster (especially if the app crashed or closed accidentally).

But from what I understand, you mean they just save the result for later analytics in a different tool, not continuing in the GUI.

I think for now we can really forget about downstream analysts and just consider what "the" analyst wants to do with the data.

yoid2000 · 2021-08-27T13:29:51Z

yoid2000
Aug 27, 2021
Maintainer

But from what I understand, you mean they just save the result for later analytics in a different tool, not continuing in the GUI.

Well all I'm saying here is that we don't currently have support for that "continuing work" user story at all, so saving the combined view might be a quick-and-dirty hack to partially support it.

But only partially. Which to me still argues for not doing it in the initial version. I'd rather support the continuing work story properly (which will be some effort for sure) if we really need it, and likewise support the downstream analyst properly once we better understand what that means.

7 replies

cristianberneanu Aug 30, 2021
Maintainer

I feel pretty strongly about not doing the "save their work" user story. It will take a significant amount of time, will increase app complexity and we need to prioritize our core features.

I'm less sure about the combined data export. On one hand, it is a simple change and it might provide some utility for some users.
On the other hand, it adds extra clutter to the UI and makes it easier to mistakenly export sensitive data.

For the sake of keeping the app simple and "Easy", I say we don't allow the export of any sensitive data and we concentrate instead on providing accurate and relevant summary information for the analyst to know about the quality of the anonymization output.

fjab Aug 30, 2021
Collaborator Author

The "save their work" story is either for later, or won't be done at all. It's a stab in the dark at this point.

I don't think exporting the combined view adds significant clutter to the interface. In any case, I guess it all boils down to how important it is – I believe it is of great importance for relevant use cases, since the summary is all fine and good, but can't tell you how to clean up the export.
You don't seem to see it that way – let's wait for user input then.

yoid2000 Aug 30, 2021
Maintainer

Thanks Cristian.

cristianberneanu Aug 30, 2021
Maintainer

I don't think exporting the combined view adds significant clutter to the interface.

It doesn't in absolute terms. But it will split the work flow, so in relative terms, I think it is a significant complication.

but can't tell you how to clean up the export.

This is the root of the issue. Do we want to support multiple workflows (like post-processing the export in a separate tool) or a single, linear workflow? My understanding is we want the latter.

fjab Aug 30, 2021
Collaborator Author

I see no way in hell that we allow all the data processing in Easy Diffix. There is too much stuff to be done, and that would really complicate the interface.

Say you want to swap the order of the columns.
Say you want to search and replace a certain term.
Say you want to remove certain rows.
Say you want to combine results from different analyses in one table or graph. Oh yes, say you want a graph!

I'm thinking of Bianca's use case at TeamBank back then: She had certain queries that she ran over specific slices of the data, and then she exported to result and combined them all in one giant Excel sheet for post-processing. This is what we're looking at with any reasonable use case.

yoid2000 · 2021-08-30T10:51:28Z

yoid2000
Aug 30, 2021
Maintainer

Yes certainly there will be post processing. I think we should avoid providing any support in the tool that can otherwise be done easily outside the tool. The tool should support generating anonymous data and metadata and little else.

Exporting the combined view is a nice helper for the analyst. The same information could be derived outside our tool, but not easily. So we should see if there is demand for it.

Exporting the noise amounts is something that cannot be derived outside our tool (at least, not after we go beyond counting users).

1 reply

cristianberneanu Aug 30, 2021
Maintainer

The tool should support generating anonymous data and metadata and little else.

The metadata part is the one that is creating confusion. If you drop it, things are clear and straightforward.
With it, a debate arises on what should be included in the export.

Exporting the noise amounts is something that cannot be derived outside our tool (at least, not after we go beyond counting users).

The actual noise amount can be derived from the anonymized and real values.
The SD for the noise generation can't be, but we can safely export an anonymized value by implementing the count_noise & friends aggregators.

yoid2000 · 2021-08-30T12:23:13Z

yoid2000
Aug 30, 2021
Maintainer

Yes, I meant the later (the noise SD)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we allow the export of Easy Diffix' "combined view"? #130

{{title}}

Replies: 6 comments 11 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Should we allow the export of Easy Diffix' "combined view"? #130

fjab Aug 26, 2021 Collaborator

Replies: 6 comments · 11 replies

yoid2000 Aug 26, 2021 Maintainer

fjab Aug 26, 2021 Collaborator Author

yoid2000 Aug 26, 2021 Maintainer

fjab Aug 27, 2021 Collaborator Author

yoid2000 Aug 27, 2021 Maintainer

fjab Aug 27, 2021 Collaborator Author

yoid2000 Aug 27, 2021 Maintainer

cristianberneanu Aug 30, 2021 Maintainer

fjab Aug 30, 2021 Collaborator Author

yoid2000 Aug 30, 2021 Maintainer

cristianberneanu Aug 30, 2021 Maintainer

fjab Aug 30, 2021 Collaborator Author

yoid2000 Aug 30, 2021 Maintainer

cristianberneanu Aug 30, 2021 Maintainer

yoid2000 Aug 30, 2021 Maintainer

fjab
Aug 26, 2021
Collaborator

Replies: 6 comments 11 replies

yoid2000
Aug 26, 2021
Maintainer

fjab Aug 26, 2021
Collaborator Author

yoid2000
Aug 26, 2021
Maintainer

fjab Aug 27, 2021
Collaborator Author

yoid2000
Aug 27, 2021
Maintainer

fjab Aug 27, 2021
Collaborator Author

yoid2000
Aug 27, 2021
Maintainer

cristianberneanu Aug 30, 2021
Maintainer

fjab Aug 30, 2021
Collaborator Author

yoid2000 Aug 30, 2021
Maintainer

cristianberneanu Aug 30, 2021
Maintainer

fjab Aug 30, 2021
Collaborator Author

yoid2000
Aug 30, 2021
Maintainer

cristianberneanu Aug 30, 2021
Maintainer

yoid2000
Aug 30, 2021
Maintainer