Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidating machineInfo and activityInfo + reporting docker more accurately (SC-1503) #2622

Merged
merged 11 commits into from
Jul 27, 2023

Conversation

orndorffgrant
Copy link
Collaborator

@orndorffgrant orndorffgrant commented Jun 16, 2023

Why is this needed?

This PR implements the following two specs:

Test Steps

To test the docker detection changes, use the following Dockerfile

from ubuntu:focal

run systemd-detect-virt || echo "systemd-detect-virt not available"

run apt update

copy ./ua.deb /ua.deb
run apt install /ua.deb -y

run python3 -c "from uaclient import system; print(system.get_virt_type() or 'failed')" | tee /result.txt

Run the following against a deb of pro-client from before and after these changes.

# docker old
docker build .

# docker new
env DOCKER_BUILDKIT=1 docker build . -t test
docker run -it test cat /result.txt

# podman
podman build .

debs built before these changes will display "failed" in the last step (or in result.txt). After these changes, it should accurately detect docker or podman, respectively.

Checklist

  • I have updated or added any unit tests accordingly
  • I have updated or added any integration tests accordingly
  • [n/a] I have updated or added any documentation accordingly

Does this PR require extra reviews?

  • Yes - csteam
  • No

@github-actions
Copy link

github-actions bot commented Jun 16, 2023

Jira: SC-1503

GitHub Issues: No GitHub issues are fixed by this PR. (No commits have Fixes: #### references)

Launchpad Bugs: No Launchpad bugs are fixed by this PR. (No commits have LP: #### references)

Documentation: The changes in this PR do not require documentation changes.

👍 this comment to confirm that this is correct.

@orndorffgrant orndorffgrant force-pushed the machine-activity-info-v2 branch 7 times, most recently from ae06224 to 1063142 Compare June 21, 2023 19:44
@orndorffgrant orndorffgrant changed the title wip: machine activity v2 consolidating machineInfo and activityInfo + reporting docker more accurately (SC-1503) Jun 21, 2023
@orndorffgrant orndorffgrant force-pushed the machine-activity-info-v2 branch 3 times, most recently from 6e7609b to 5c44845 Compare July 3, 2023 17:49
@orndorffgrant orndorffgrant marked this pull request as ready for review July 3, 2023 17:54
Copy link
Member

@renanrodrigo renanrodrigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm
Question: is this new docker detection step, and the information we send to the Contracts Server, documented somewhere we need to update?

uaclient/system.py Show resolved Hide resolved
features/airgapped.feature Outdated Show resolved Hide resolved
uaclient/testing/fakes.py Show resolved Hide resolved
uaclient/contract.py Show resolved Hide resolved
try:
contract.request_updated_contract(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, can't we update request_updated_contract with this extra logic ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that could solve the specific need of this PR, but the refactor stemmed from two observations:

  1. The exception cases below with comments claiming to run after a partial attach are wrong - they run after failed attaches as well. That is due to the attachment and enablement being combined in the one request_updated_contract function and that function doesn't separate the exceptions raised in attachment vs during enablement (my first idea of refactoring was do exactly that).
  2. The request_updated_contract function was kind-of two-functions-in-one. Most of it's uses were as "refresh", and one of its uses was as "attach". Since there was only one usage of the "attach" variant of request_updated_contract, I thought i made sense to bring the attach logic up into our "attach_with_token" function. That leaves only the refresh code behind, which is what every other caller was using.

The other thing is that I think attach_with_token should be the place where "attach"y things happen, and it mostly is, so it would be confusing to put some actions related to "attach" (saving attachment date, and sending machine activity ping) down in request_updated_contract.

machine_id = system.get_machine_id(self.cfg)
return {
"machineId": machine_id,
def _get_activity_info(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this function already returns the compatible structure we need to send to the Contract server ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns exactly this structure: https://github.com/canonical/ua-contracts/blob/develop/docs/contracts.yaml#L275

That is used directly as the body for update_activity_token and used directly as the value of "activityInfo" for the body of update_contract_machine.

Everything else is a special case that requires the same values but only a subset of them, or in add_contract_machine needs lastAttachment even though we're not attached yet.

Comment on lines +196 to +209
if "docker" in proc_1_cgroup or "buildkit" in proc_1_cgroup:
return "docker"
elif "buildah" in proc_1_cgroup:
return "podman"
else:
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those these names appear in specific patterns in the string (i.e. beginning or end of the line) ? If yes, should we have a more restrictive regex for them or do you think name clashing is really unlikely here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is what the file looks like in each situation, using the following dockerfile:

from ubuntu:focal
run cat /proc/1/cgroup | tee result.txt

docker build .

13:perf_event:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
12:blkio:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
11:rdma:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
10:pids:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
9:cpuset:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
8:net_cls,net_prio:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
7:devices:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
6:freezer:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
5:cpu,cpuacct:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
4:hugetlb:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
3:memory:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
2:misc:/
1:name=systemd:/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232
0::/docker/43820adb8df766d314faece4ddf62332a0eee1fc1c5154ffd4b883fb88cb1232

env DOCKER_BUILDKIT=1 docker build . -t test
docker run -it test cat /result.txt

13:perf_event:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
12:blkio:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
11:rdma:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
10:pids:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
9:cpuset:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
8:net_cls,net_prio:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
7:devices:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
6:freezer:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
5:cpu,cpuacct:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
4:hugetlb:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
3:memory:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
2:misc:/
1:name=systemd:/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5
0::/docker/buildkit/mc0szzrnjv8hx2hqwzvmzriq5

podman build .

13:perf_event:/
12:blkio:/user.slice
11:rdma:/
10:pids:/user.slice/user-1000.slice/[email protected]
9:cpuset:/
8:net_cls,net_prio:/
7:devices:/user.slice
6:freezer:/
5:cpu,cpuacct:/user.slice
4:hugetlb:/
3:memory:/user.slice/user-1000.slice/[email protected]
2:misc:/
1:name=systemd:/user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-1b54bbb6634a43a78eb7033cd3b3ac17.scope/buildah-buildah2868387759
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-1b54bbb6634a43a78eb7033cd3b3ac17.scope

The least obvious one is the podman one where "buildah" only shows up in the "name" field.

So, we could try to make things more strict, but my thought was that it is really unlikely for a clash here. It would only happen in an environment where systemd-detect-virt wasn't installed (so we fall back to this function), then the user would have to be doing something tricky/non-standard for one of the magic words to show up there.

And since we're only doing this to get a sense of usage, not for strict tracking, I think its okay.

uaclient/contract.py Outdated Show resolved Hide resolved
@orndorffgrant
Copy link
Collaborator Author

@lucasmoura @renanrodrigo I think this is good to go now.
@renanrodrigo I don't think we have any documentation on what info is sent in attach/refresh/activity API calls, so I created #2655 to add something.

@lucasmoura
Copy link
Contributor

@orndorffgrant I think we just need to fix a test for xenial and we should be good here

Copy link
Member

@renanrodrigo renanrodrigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I think this can be merged with the required csteam review

@orndorffgrant
Copy link
Collaborator Author

@TeodorPt When you have a moment, can you take a quick look at one commit in this PR? this one: 045baf1
Specifically, the _get_activity_info function.

I'd like to make sure this aligns with your expectations before merging/releasing.

@TeodorPt
Copy link
Contributor

@orndorffgrant we are currently at a sprint, it would be great if this could wait until next week s.t. I can pay proper attention to it

@orndorffgrant
Copy link
Collaborator Author

@TeodorPt Yep it can wait til next week - all good!

Copy link
Contributor

@TeodorPt TeodorPt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delays. The new payload per discussions in US039 LGTM. I have some questions but they shouldn't influence the solution here.

"machineId": request_body.get("machineId"),
"activityInfo": activity_info,
"architecture": activity_info.get("architecture"),
"os": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was also a version in the old type, is that not checked in the old airgapped? cc @pandrey2003

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, seems like the airgapped neither checks nor uses this field. However, thanks for re-checking!

As a matter of fact, all the provided properties here are correct 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking! I referenced the airgapped implementation when writing this function which is why I didn't include version.

@@ -1558,6 +1564,10 @@ def action_attach(args, *, cfg):
event.info(msg.msg, file_type=sys.stderr)
event.error(error_msg=msg.msg, error_code=msg.name)
ret = 1

contract_client = contract.UAContractClient(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? attach_with_token above will anyway send the activity info. I see this is how it is approved in the spec, but I don't recall exactly why. It doesn't hurt the server (besides some extra requests), just wondering what the rationale is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added this because the attach request itself can't include any services that got enabled. At spec time we decided we wanted data of what services got enabled immediately after attach (this could include enableByDefault: true services, or services that the user configured the pro-client to immediately enable). To get that data we decided to immediately send an activity ping after the post-attach-service-enables to get that info recorded.

@@ -212,7 +208,7 @@ def update_activity_token(self):
machine_token = self.cfg.machine_token.get("machineToken")
machine_id = system.get_machine_id(self.cfg)

request_data = self._get_activity_info(machine_id)
request_data = self._get_activity_info()
Copy link
Contributor

@TeodorPt TeodorPt Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR, but for the overall approach:

I think in this method, lower below, you are saving the response from the UPDATE_ACTIVITY_TOKEN request in a file. As far as I can tell, in the current implementation the endpoint will return the entire payload (with the new activity ID, token and all the extra information you send as is). Is this fine? If not, should we change the behavior s.t. we only return the activity identifiers and not the metadata?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent point! And something nobody else noticed.

I think this behavior is fine since the extra data is minimal and non-sensitive and its presence doesn't hurt the client - we treat that state file as a json blob, so extra fields are ignored.

In the future we hope to improve/organize our on-disk state and at that time we may think about discarding the extra metadata on the client side, but even then I don't think it hurts for the server to send it back.

@orndorffgrant
Copy link
Collaborator Author

CI failures are unrelated to these changes and fixed in #2672

@orndorffgrant orndorffgrant merged commit f64181d into main Jul 27, 2023
21 of 23 checks passed
@orndorffgrant orndorffgrant deleted the machine-activity-info-v2 branch July 27, 2023 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants