-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consolidating machineInfo and activityInfo + reporting docker more accurately (SC-1503) #2622
Conversation
Jira: SC-1503 GitHub Issues: No GitHub issues are fixed by this PR. (No commits have Fixes: #### references) Launchpad Bugs: No Launchpad bugs are fixed by this PR. (No commits have LP: #### references) Documentation: The changes in this PR do not require documentation changes. 👍 this comment to confirm that this is correct. |
ae06224
to
1063142
Compare
6e7609b
to
5c44845
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
Question: is this new docker detection step, and the information we send to the Contracts Server, documented somewhere we need to update?
try: | ||
contract.request_updated_contract( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking, can't we update request_updated_contract
with this extra logic ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and that could solve the specific need of this PR, but the refactor stemmed from two observations:
- The exception cases below with comments claiming to run after a partial attach are wrong - they run after failed attaches as well. That is due to the attachment and enablement being combined in the one request_updated_contract function and that function doesn't separate the exceptions raised in attachment vs during enablement (my first idea of refactoring was do exactly that).
- The request_updated_contract function was kind-of two-functions-in-one. Most of it's uses were as "refresh", and one of its uses was as "attach". Since there was only one usage of the "attach" variant of request_updated_contract, I thought i made sense to bring the attach logic up into our "attach_with_token" function. That leaves only the refresh code behind, which is what every other caller was using.
The other thing is that I think attach_with_token
should be the place where "attach"y things happen, and it mostly is, so it would be confusing to put some actions related to "attach" (saving attachment date, and sending machine activity ping) down in request_updated_contract.
machine_id = system.get_machine_id(self.cfg) | ||
return { | ||
"machineId": machine_id, | ||
def _get_activity_info(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this function already returns the compatible structure we need to send to the Contract server ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns exactly this structure: https://github.com/canonical/ua-contracts/blob/develop/docs/contracts.yaml#L275
That is used directly as the body for update_activity_token
and used directly as the value of "activityInfo" for the body of update_contract_machine
.
Everything else is a special case that requires the same values but only a subset of them, or in add_contract_machine needs lastAttachment
even though we're not attached yet.
if "docker" in proc_1_cgroup or "buildkit" in proc_1_cgroup: | ||
return "docker" | ||
elif "buildah" in proc_1_cgroup: | ||
return "podman" | ||
else: | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those these names appear in specific patterns in the string (i.e. beginning or end of the line) ? If yes, should we have a more restrictive regex for them or do you think name clashing is really unlikely here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is what the file looks like in each situation, using the following dockerfile:
from ubuntu:focal
run cat /proc/1/cgroup | tee result.txt
docker build .
13:perf_event:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
12:blkio:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
11:rdma:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
10:pids:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
9:cpuset:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
8:net_cls,net_prio:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
7:devices:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
6:freezer:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
5:cpu,cpuacct:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
4:hugetlb:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
3:memory:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
2:misc:/
1:name=systemd:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
0::/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
env DOCKER_BUILDKIT=1 docker build . -t test
docker run -it test cat /result.txt
13:perf_event:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
12:blkio:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
11:rdma:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
10:pids:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
9:cpuset:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
8:net_cls,net_prio:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
7:devices:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
6:freezer:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
5:cpu,cpuacct:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
4:hugetlb:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
3:memory:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
2:misc:/
1:name=systemd:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
0::/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
podman build .
13:perf_event:/
12:blkio:/user.slice
11:rdma:/
10:pids:/user.slice/user-1000.slice/[email protected]
9:cpuset:/
8:net_cls,net_prio:/
7:devices:/user.slice
6:freezer:/
5:cpu,cpuacct:/user.slice
4:hugetlb:/
3:memory:/user.slice/user-1000.slice/[email protected]
2:misc:/
1:name=systemd:/user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-1b54bbb6634a43a78eb7033cd3b3ac17.scope/buildah-buildah2868387759
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-1b54bbb6634a43a78eb7033cd3b3ac17.scope
The least obvious one is the podman one where "buildah" only shows up in the "name" field.
So, we could try to make things more strict, but my thought was that it is really unlikely for a clash here. It would only happen in an environment where systemd-detect-virt
wasn't installed (so we fall back to this function), then the user would have to be doing something tricky/non-standard for one of the magic words to show up there.
And since we're only doing this to get a sense of usage, not for strict tracking, I think its okay.
5c44845
to
ffc1175
Compare
@lucasmoura @renanrodrigo I think this is good to go now. |
@orndorffgrant I think we just need to fix a test for xenial and we should be good here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think this can be merged with the required csteam review
@orndorffgrant we are currently at a sprint, it would be great if this could wait until next week s.t. I can pay proper attention to it |
@TeodorPt Yep it can wait til next week - all good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delays. The new payload per discussions in US039 LGTM. I have some questions but they shouldn't influence the solution here.
"machineId": request_body.get("machineId"), | ||
"activityInfo": activity_info, | ||
"architecture": activity_info.get("architecture"), | ||
"os": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was also a version
in the old type, is that not checked in the old airgapped? cc @pandrey2003
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, seems like the airgapped neither checks nor uses this field. However, thanks for re-checking!
As a matter of fact, all the provided properties here are correct 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking! I referenced the airgapped implementation when writing this function which is why I didn't include version
.
@@ -1558,6 +1564,10 @@ def action_attach(args, *, cfg): | |||
event.info(msg.msg, file_type=sys.stderr) | |||
event.error(error_msg=msg.msg, error_code=msg.name) | |||
ret = 1 | |||
|
|||
contract_client = contract.UAContractClient(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? attach_with_token
above will anyway send the activity info. I see this is how it is approved in the spec, but I don't recall exactly why. It doesn't hurt the server (besides some extra requests), just wondering what the rationale is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added this because the attach request itself can't include any services that got enabled. At spec time we decided we wanted data of what services got enabled immediately after attach (this could include enableByDefault: true
services, or services that the user configured the pro-client to immediately enable). To get that data we decided to immediately send an activity ping after the post-attach-service-enables to get that info recorded.
@@ -212,7 +208,7 @@ def update_activity_token(self): | |||
machine_token = self.cfg.machine_token.get("machineToken") | |||
machine_id = system.get_machine_id(self.cfg) | |||
|
|||
request_data = self._get_activity_info(machine_id) | |||
request_data = self._get_activity_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily for this PR, but for the overall approach:
I think in this method, lower below, you are saving the response from the UPDATE_ACTIVITY_TOKEN request in a file. As far as I can tell, in the current implementation the endpoint will return the entire payload (with the new activity ID, token and all the extra information you send as is). Is this fine? If not, should we change the behavior s.t. we only return the activity identifiers and not the metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent point! And something nobody else noticed.
I think this behavior is fine since the extra data is minimal and non-sensitive and its presence doesn't hurt the client - we treat that state file as a json blob, so extra fields are ignored.
In the future we hope to improve/organize our on-disk state and at that time we may think about discarding the extra metadata on the client side, but even then I don't think it hurts for the server to send it back.
4ad50f8
to
0b4dd32
Compare
CI failures are unrelated to these changes and fixed in #2672 |
Why is this needed?
This PR implements the following two specs:
Test Steps
To test the docker detection changes, use the following Dockerfile
Run the following against a deb of pro-client from before and after these changes.
debs built before these changes will display "failed" in the last step (or in result.txt). After these changes, it should accurately detect docker or podman, respectively.
Checklist
Does this PR require extra reviews?