Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: describe all data collection #3216

Merged
merged 6 commits into from
Jul 25, 2024
Merged

Conversation

orndorffgrant
Copy link
Collaborator

Fixes: #2894

@@ -0,0 +1,86 @@
What data does Canonical collect from Ubuntu Pro machines?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably suggest changing this header to "what data does Canonical collect through the Ubuntu Pro Client"

The reason being that the contents of the page are relevant to people who haven't attached yet (and want the info to know if they want to attach), or who have detached and aren't using Pro anymore (can we consider their machine an Ubuntu Pro machine?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two of the sections are data not collected directly through the Ubuntu Pro Client, but the collection happens because you used the pro-client to attach to Pro. So I'm not sure "collect through the Ubuntu Pro Client" is quite accurate.

A detached machine shouldn't be considered an Ubuntu Pro machine - they are in the same bucket as never attached machines for the purpose of this doc.
None of this data is collected for unattached machines. And I don't think the current title would prevent someone wondering about data collection from looking at it. We could rename it to "What data does Canonical collect from Ubuntu machines that are attached to an Ubuntu Pro subscription" but that felt unnecessarily verbose

IDK though, I don't have a strong opinion on the title here, just wanted to list my thoughts before changing it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that depends on how long we keep data for. Is machine data purged when the machine is detached?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is not purged on detach - so a detached machine would have had data collected while it was Ubuntu Pro, and after it is detached, no more data will be collected, but data will exist on the backend for some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data isn't purged on detach, then I think there is a difference between detached and never-attached machines since users who care about this stuff want to know what we collect, why, and how long we expect to keep that data for.

I think tweaking the title to say "What data is collected from active Ubuntu Pro machines?" would be enough to satisfy the distinction, especially if we can also provide info on how long it takes before collected info is purged (although I wouldn't consider that a blocker).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense!

Copy link
Collaborator

@a-dubs a-dubs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some suggested changes! thanks grant!

docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Show resolved Hide resolved
Copy link
Contributor

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some suggestions

docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved
docs/explanations/data_collection.rst Outdated Show resolved Hide resolved

These data elements are collected to ensure machines that are attached to a
particular Ubuntu Pro contract are compliant with the terms of that particular
contract.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could any of this info/data be considered as personally identifiable?

Do we know roughly how long is data kept for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of this counts as personally identifiable on it's own, but it is connected to an Ubuntu Pro account on the backend via a machine id.

Do we know roughly how long is data kept for?

I don't know the answer to this one. Tagging @pandrey2003 and @alnvdl-work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't consider this as a blocker to getting this merged. We can always add a section later at the bottom here about data retention.

**********************************************************

Some system data is sent to Canonical servers for the purpose of delivering
Ubuntu Pro services in compliance with the terms of the Ubuntu Pro subscriptio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/subscriptio/subscription

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops 🙃


This document categorises data collection by method of collection.

APT package downloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be wrong here, but it seems that the APT packages downloads and Livepatch downloads are not directly tied to data collection per se. It seems more like data used per service than a collection of some sorts.

I think there is still value for those sections, but I would not put them under data collection. We could create something like service data needs for them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point, this is data that is sent to canonical servers for the purposes of using the services. I'll rework the structure of this a bit to make that more clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-makin I've restructured the document and headers around this. All the content is the same, but I'm not sure about the new header "Data sent in order to provide service" - do you have any better ideas?

Copy link
Contributor

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your patience!

@orndorffgrant orndorffgrant merged commit 4ca5d45 into docs Jul 25, 2024
5 of 7 checks passed
@orndorffgrant orndorffgrant deleted the data-collection-explanation branch July 25, 2024 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants