Add encrypted scratch space for container sandbox #2212

mkulke · 2024-12-16T16:40:36Z

fixes #1921

This is a PoC to address the problem of secure scratch space in which we can download and unpack images to the sandbox, write files at runtime etc.

Currently this is either backed by storage that is not protected by the TEE (packer) or as tmpfs mounts that will consume ram during runtime (mkosi). For larger images like pytorch w/ cuda this implies in theory that we have to set aside 8GB of ram just to be able to download and unpack the image. tmpfs mounts are by default limited to a ratio of the ram, so we might end up having to overprovision the machine to be able to run a workload.

The proposes change introduces a CAA-wide parameter that allows to specify the disk size. A cloud provider can use this as an indicator of the size of the to-be-provisioned podvm disk. A zero value means that we stick to the default size of the image.

if a size has been given (the mkosi image is ~0.5gb) it will extend the image to that size. At launch the podvm will claim that space and create an ad-hoc encrypted volume on it. Depending on the presence of a marker file in user-data the agent unit will mount that to /run/kata-containers prior to launching kata-agent.

I'm not super happy about the API (I would prefer to specify it per pod) and the implementation (the marker file) but I wanted to push early to get feedback. (also, it's only implemented for azure at the moment)

FWIW it does what it's supposed to do, see some numbers on #1958

src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go

src/cloud-api-adaptor/entrypoint.sh

src/cloud-api-adaptor/podvm-mkosi/mkosi.skeleton/usr/lib/repart.d/30-scratch.conf

bpradipt · 2024-12-17T16:55:10Z

if a size has been given (the mkosi image is ~0.5gb) it will extend the image to that size. At launch the podvm will claim that space and create an ad-hoc encrypted volume on it. Depending on the presence of a marker file in user-data the agent unit will mount that to /run/kata-containers prior to launching kata-agent.

Would it also make sense to extend this approach to use add-on disk like available with some instance types ? Not needed for initial iteration, but is this something we should consider and lay the foundation now?

I'm not super happy about the API (I would prefer to specify it per pod) and the implementation (the marker file) but I wanted to push early to get feedback. (also, it's only implemented for azure at the moment)

I think it would be better to also add an annotation (eg io.katacontainers.config.hypervisor.root_disk_size) in kata for remote hypervisor to use for per-pod scratch space. Similar to what we use for other such annotations.

mkulke · 2024-12-17T17:09:19Z

Would it also make sense to extend this approach to use add-on disk like available with some instance types ? Not needed for initial iteration, but is this something we should consider and lay the foundation now?

for future extensibility, I think we can freely iterate. We don't currently have a version contract between PodVM and CAA. We would have to look at the user-facing api maybe to make this possible.

I think it would be better to also add an annotation (eg io.katacontainers.config.hypervisor.root_disk_size) in kata for remote hypervisor to use for per-pod scratch space. Similar to what we use for other such annotations.

I agree, I didn't spot any existing fitting annotation in kata at first glance, so we might have to add it.

bpradipt · 2024-12-18T05:01:46Z

I think it would be better to also add an annotation (eg io.katacontainers.config.hypervisor.root_disk_size) in kata for remote hypervisor to use for per-pod scratch space. Similar to what we use for other such annotations.

I agree, I didn't spot any existing fitting annotation in kata at first glance, so we might have to add it.

Yes, a new annotation will need to be added.

src/cloud-api-adaptor/cmd/cloud-api-adaptor/main.go

mkulke · 2025-01-09T08:33:00Z

Yes, a new annotation will need to be added.

We discussed it in the last community call and concluded to start with a caa-wide option. once this is merged, we can start add something to kata-runtime to make it tweakable per-pod

stevenhorsman

I think commits 1 & 2 need some tidying as there is new code added in 1 that is changed in 2 which was a bit confusing. Additionally it would have been nicer (for me to review) to split the resource refactor out from commit 1, into a previous commit. It's not a blocker, maybe adding a sentence to explain that in the commit message #1 would be helpful for future people looking back at this?

src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go

src/cloud-api-adaptor/pkg/util/cloud.go

src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go

src/cloud-api-adaptor/cmd/cloud-api-adaptor/main.go

stevenhorsman · 2025-01-13T11:00:22Z

src/cloud-api-adaptor/podvm-mkosi/Makefile

@@ -76,6 +76,7 @@ else
 	touch resources/buildBootableImage
 	nix develop ..#podvm-mkosi --command mkosi --environment=VARIANT_ID=production
 	qemu-img convert -f raw -O qcow2 build/system.raw build/podvm-$(PODVM_DISTRO)-$(ARCH).qcow2
+	qemu-img resize build/podvm-$(PODVM_DISTRO)-$(ARCH).qcow2 +100M


Is there any significance to +100M, or is it just a starter amount of space?

it needs to have a bit of size, so repart will not stumble over it at launch, attempting to create a partition without space. not sure if needs to be +100m, could just also work with +10m, have to test.

Ok, I don't have an issue with the 100M, just wanted to check if there was special logic behind it.

This is only being done on the x86 path here as only Azure is supporting this scratch space at the moment I assume? Do you see any obstacles in the code for s390x support if/when needed?

I don't know much about partitioning schemes on s390x. if systemd-repart works in the same way, it should work ootb. The cloud-provider code would need to consider the .storage property when provisioning a VM and then repart would grab whatever it available to allocate it to the scratch space.

The CI wasn't building debug images, due string/bool mismatch. Signed-off-by: Magnus Kulke <[email protected]>

This adds the configuration for an encrypted scratch space on an mkosi image. At bootup a /dev/sda4 partition will be created and encrypted with LUKS using an ephemeral key. The partition will use the space available on the image volume. By default the qcow2 image has 100mb allocated for this space. This amount of space will only work for very small images, hence we do not mount the scratch space to `/run/kata-container` by default. If the kata-agent service units encounters a `/run/peerpod/mount-scratch` file it will mount the encrypted partition `/dev/sda4` to `/run/kata-containers`. This file is provisioned by `process-user-data`, configured by the CAA daemonset. A new CAA parameter has been introduced that allows to specify the disk size. If we have a disk size that is larger than 0 a cloud provider can attempt to consider this size to create space for an encrypted scratch partition that can be mounted to /run/container. Signed-off-by: Magnus Kulke <[email protected]>

stevenhorsman

This looks like a good starting point to me. Thanks @mkulke!

bpradipt

/lgtm

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 3 times, most recently from a6c6fe4 to c19898a Compare December 16, 2024 17:58

mkulke commented Dec 16, 2024

View reviewed changes

src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go Outdated Show resolved Hide resolved

mkulke added test_e2e_libvirt Run Libvirt e2e tests and removed test_e2e_libvirt Run Libvirt e2e tests labels Dec 17, 2024

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 11 times, most recently from 5fb77d7 to ab7fccf Compare December 17, 2024 16:41

bpradipt reviewed Dec 17, 2024

View reviewed changes

src/cloud-api-adaptor/entrypoint.sh Outdated Show resolved Hide resolved

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from ab7fccf to 0e961cf Compare December 17, 2024 16:44

bpradipt reviewed Dec 17, 2024

View reviewed changes

src/cloud-api-adaptor/podvm-mkosi/mkosi.skeleton/usr/lib/repart.d/30-scratch.conf Show resolved Hide resolved

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 0e961cf to 6655aa3 Compare December 17, 2024 18:04

mkulke added the test_e2e_libvirt Run Libvirt e2e tests label Dec 17, 2024

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 6655aa3 to 34a553c Compare December 17, 2024 18:13

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 4 times, most recently from ea95283 to a097ec7 Compare December 18, 2024 13:59

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from a097ec7 to ba8cc4c Compare December 18, 2024 14:04

mkulke added test_e2e_libvirt Run Libvirt e2e tests and removed test_e2e_libvirt Run Libvirt e2e tests labels Dec 18, 2024

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 4b05766 to c0c9f4c Compare December 20, 2024 09:36

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch 3 times, most recently from 45bdba1 to 6b24c6c Compare January 7, 2025 11:03

mkulke changed the title ~~RFC: add encrypted scratch space for sandbox~~ Add encrypted scratch space for container sandbox Jan 8, 2025

mkulke marked this pull request as ready for review January 8, 2025 14:29

mkulke requested a review from a team as a code owner January 8, 2025 14:29

bpradipt reviewed Jan 8, 2025

View reviewed changes

src/cloud-api-adaptor/cmd/cloud-api-adaptor/main.go Show resolved Hide resolved

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 6b24c6c to a402bae Compare January 9, 2025 08:26

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from a402bae to 3521c2d Compare January 13, 2025 09:21

mkulke requested a review from bpradipt January 13, 2025 09:32

stevenhorsman reviewed Jan 13, 2025

View reviewed changes

ci: fix debug flag assertion for mkosi image-debug

41f34bc

The CI wasn't building debug images, due string/bool mismatch. Signed-off-by: Magnus Kulke <[email protected]>

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from 3521c2d to e3fbec7 Compare January 13, 2025 11:33

mkulke force-pushed the mkulke/add-scratch-space-for-sandbox branch from e3fbec7 to 0b40295 Compare January 13, 2025 12:42

stevenhorsman approved these changes Jan 13, 2025

View reviewed changes

bpradipt approved these changes Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add encrypted scratch space for container sandbox #2212

Add encrypted scratch space for container sandbox #2212

mkulke commented Dec 16, 2024

bpradipt commented Dec 17, 2024

mkulke commented Dec 17, 2024

bpradipt commented Dec 18, 2024

mkulke commented Jan 9, 2025

stevenhorsman left a comment

stevenhorsman Jan 13, 2025

mkulke Jan 13, 2025

stevenhorsman Jan 13, 2025

stevenhorsman Jan 13, 2025

mkulke Jan 13, 2025

stevenhorsman left a comment

bpradipt left a comment

Add encrypted scratch space for container sandbox #2212

Are you sure you want to change the base?

Add encrypted scratch space for container sandbox #2212

Conversation

mkulke commented Dec 16, 2024

bpradipt commented Dec 17, 2024

mkulke commented Dec 17, 2024

bpradipt commented Dec 18, 2024

mkulke commented Jan 9, 2025

stevenhorsman left a comment

Choose a reason for hiding this comment

stevenhorsman Jan 13, 2025

Choose a reason for hiding this comment

mkulke Jan 13, 2025

Choose a reason for hiding this comment

stevenhorsman Jan 13, 2025

Choose a reason for hiding this comment

stevenhorsman Jan 13, 2025

Choose a reason for hiding this comment

mkulke Jan 13, 2025

Choose a reason for hiding this comment

stevenhorsman left a comment

Choose a reason for hiding this comment

bpradipt left a comment

Choose a reason for hiding this comment