-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: describe all data collection #3216
Changes from 4 commits
aecfa68
a1b56ae
cb529d4
5f51ad9
740dc8c
c6abb3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
What data does Canonical collect from Ubuntu Pro machines? | ||
********************************************************** | ||
|
||
Some system data is sent to Canonical servers for the purpose of delivering | ||
Ubuntu Pro services in compliance with the terms of the Ubuntu Pro subscriptio | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/subscriptio/subscription There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoops 🙃 |
||
. This data is sent via a few different methods, depending on the service and | ||
the purpose of that particular data element. | ||
|
||
This document categorises data collection by method of collection. | ||
|
||
APT package downloads | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can be wrong here, but it seems that the I think there is still value for those sections, but I would not put them under There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's a good idea :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's a good point, this is data that is sent to canonical servers for the purposes of using the services. I'll rework the structure of this a bit to make that more clear. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @s-makin I've restructured the document and headers around this. All the content is the same, but I'm not sure about the new header "Data sent in order to provide service" - do you have any better ideas? |
||
===================== | ||
|
||
If you have any of the following services enabled, then the data collection | ||
method described below will be in use whenever downloading packages for one of | ||
these services. | ||
orndorffgrant marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- ``anbox-cloud`` | ||
- ``cc-eal`` | ||
- ``cis`` | ||
- ``esm-apps`` | ||
- ``esm-infra`` | ||
- ``fips`` | ||
- ``fips-preview`` | ||
- ``fips-updates`` | ||
- ``realtime-kernel`` | ||
- ``ros`` | ||
- ``ros-updates`` | ||
- ``usg`` | ||
|
||
Whenever you ``apt install`` a package from a Pro service (or ``apt upgrade`` | ||
to a version of a package from a Pro service), ``apt`` will make a GET request | ||
to ``esm.ubuntu.com`` that includes the package name and version, and HTTP | ||
basic auth credentials that are tied to the Ubuntu Pro subscription. | ||
|
||
For example, installing the ``hello`` package from ``esm-apps`` will result in | ||
a request that looks something like this: | ||
|
||
.. code-block:: text | ||
|
||
https://bearer:[email protected]/apps/ubuntu/pool/main/h/hello/hello_2.10-2ubuntu4+esm1_amd64.deb | ||
|
||
This request is necessary to download the Pro update and includes the | ||
following data. | ||
|
||
- Ubuntu codename (e.g. "Jammy") | ||
- Package name (e.g. "hello") | ||
- Package version (e.g. "2.10-2ubuntu4+esm1") | ||
- Package architecture (e.g. "amd64") | ||
|
||
Because this request needs to be authenticated and the authentication token is | ||
tied to a particular Ubuntu Pro subscription, this data is inherently tied to | ||
the Ubuntu Pro subscription that authenticated access to the package. | ||
|
||
Livepatch downloads | ||
=================== | ||
|
||
If you have ``livepatch`` enabled, then the following data is sent in order to | ||
download the correct kernel patches: | ||
|
||
- Kernel version (e.g. "6.8.0-38.38-generic") | ||
- Machine architecture (e.g. "amd64") | ||
|
||
Similarly to APT package downloads, because this request needs to be | ||
authenticated and the authentication token is tied to a particular Ubuntu Pro | ||
subscription, this data is inherently tied to the Ubuntu Pro subscription that | ||
authenticated access to the package. | ||
|
||
|
||
Machine activity checks | ||
======================= | ||
|
||
Regardless of which services you have enabled, if a machine is attached to an | ||
s-makin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Ubuntu Pro subscription, the following data is collected and updated regularly | ||
(default: every 6 hours). | ||
|
||
- Distribution (e.g. "Ubuntu") | ||
- Release codename (e.g. "Noble") | ||
- Kernel version (e.g. "6.8.0-38.38-generic") | ||
- Machine architecture (e.g. "amd64") | ||
- Is the machine a desktop? (e.g. "true") | ||
- Virtualisation type (e.g. "Docker") | ||
- Services enabled (e.g. "ros" and "realtime-kernel generic variant") | ||
- When the machine was attached (e.g. "2024-07-24T13:54:07+00:00") | ||
- Version of ``ubuntu-pro-client`` (e.g. "33.2~24.04") | ||
|
||
These data elements are collected to ensure machines that are attached to a | ||
particular Ubuntu Pro contract are compliant with the terms of that particular | ||
contract. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could any of this info/data be considered as personally identifiable? Do we know roughly how long is data kept for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think any of this counts as personally identifiable on it's own, but it is connected to an Ubuntu Pro account on the backend via a machine id.
I don't know the answer to this one. Tagging @pandrey2003 and @alnvdl-work There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I won't consider this as a blocker to getting this merged. We can always add a section later at the bottom here about data retention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably suggest changing this header to "what data does Canonical collect through the Ubuntu Pro Client"
The reason being that the contents of the page are relevant to people who haven't attached yet (and want the info to know if they want to attach), or who have detached and aren't using Pro anymore (can we consider their machine an Ubuntu Pro machine?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two of the sections are data not collected directly through the Ubuntu Pro Client, but the collection happens because you used the pro-client to attach to Pro. So I'm not sure "collect through the Ubuntu Pro Client" is quite accurate.
A detached machine shouldn't be considered an Ubuntu Pro machine - they are in the same bucket as never attached machines for the purpose of this doc.
None of this data is collected for unattached machines. And I don't think the current title would prevent someone wondering about data collection from looking at it. We could rename it to "What data does Canonical collect from Ubuntu machines that are attached to an Ubuntu Pro subscription" but that felt unnecessarily verbose
IDK though, I don't have a strong opinion on the title here, just wanted to list my thoughts before changing it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that depends on how long we keep data for. Is machine data purged when the machine is detached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data is not purged on detach - so a detached machine would have had data collected while it was Ubuntu Pro, and after it is detached, no more data will be collected, but data will exist on the backend for some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If data isn't purged on detach, then I think there is a difference between detached and never-attached machines since users who care about this stuff want to know what we collect, why, and how long we expect to keep that data for.
I think tweaking the title to say "What data is collected from active Ubuntu Pro machines?" would be enough to satisfy the distinction, especially if we can also provide info on how long it takes before collected info is purged (although I wouldn't consider that a blocker).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense!