diff --git a/content/en/docs/tutorials/security/seccomp.md b/content/en/docs/tutorials/security/seccomp.md index 01e50eaecc889..033bf0177361f 100644 --- a/content/en/docs/tutorials/security/seccomp.md +++ b/content/en/docs/tutorials/security/seccomp.md @@ -39,6 +39,13 @@ profiles that give only the necessary privileges to your container processes. In order to complete all steps in this tutorial, you must install [kind](/docs/tasks/tools/#kind) and [kubectl](/docs/tasks/tools/#kubectl). +The commands used in the tutorial assume that you are using +[Docker](https://www.docker.com/) as your container runtime. (The cluster that `kind` creates may +use a different container runtime internally). You could also use +[Podman](https://podman.io/) but in that case, you would have to follow specific +[instructions](https://kind.sigs.k8s.io/docs/user/rootless/) in order to complete the tasks +successfully. + This tutorial shows some examples that are still beta (since v1.25) and others that use only generally available seccomp functionality. You should make sure that your cluster is @@ -154,111 +161,7 @@ audit.json fine-grained.json violation.json You have verified that these seccomp profiles are available to the kubelet running within kind. -## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads - -{{< feature-state state="stable" for_k8s_version="v1.27" >}} - -To use seccomp profile defaulting, you must run the kubelet with the -`--seccomp-default` -[command line flag](/docs/reference/command-line-tools-reference/kubelet) -enabled for each node where you want to use it. - -If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is -defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode. -The default profiles aim to provide a strong set -of security defaults while preserving the functionality of the workload. It is -possible that the default profiles differ between container runtimes and their -release versions, for example when comparing those from CRI-O and containerd. - -{{< note >}} -Enabling the feature will neither change the Kubernetes -`securityContext.seccompProfile` API field nor add the deprecated annotations of -the workload. This provides users the possibility to rollback anytime without -actually changing the workload configuration. Tools like -[`crictl inspect`](https://github.com/kubernetes-sigs/cri-tools) can be used to -verify which seccomp profile is being used by a container. -{{< /note >}} - -Some workloads may require a lower amount of syscall restrictions than others. -This means that they can fail during runtime even with the `RuntimeDefault` -profile. To mitigate such a failure, you can: - -- Run the workload explicitly as `Unconfined`. -- Disable the `SeccompDefault` feature for the nodes. Also making sure that - workloads get scheduled on nodes where the feature is disabled. -- Create a custom seccomp profile for the workload. - -If you were introducing this feature into production-like cluster, the Kubernetes project -recommends that you enable this feature gate on a subset of your nodes and then -test workload execution before rolling the change out cluster-wide. - -You can find more detailed information about a possible upgrade and downgrade strategy -in the related Kubernetes Enhancement Proposal (KEP): -[Enable seccomp by default](https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy). - -Kubernetes {{< skew currentVersion >}} lets you configure the seccomp profile -that applies when the spec for a Pod doesn't define a specific seccomp profile. -However, you still need to enable this defaulting for each node where you would -like to use it. - -If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to -enable the feature, either run the kubelet with the `--seccomp-default` command -line flag, or enable it through the [kubelet configuration -file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the -feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides -the minimum required Kubernetes version and enables the `SeccompDefault` feature -[in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster): - -```yaml -kind: Cluster -apiVersion: kind.x-k8s.io/v1alpha4 -nodes: - - role: control-plane - image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac - kubeadmConfigPatches: - - | - kind: JoinConfiguration - nodeRegistration: - kubeletExtraArgs: - seccomp-default: "true" - - role: worker - image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac - kubeadmConfigPatches: - - | - kind: JoinConfiguration - nodeRegistration: - kubeletExtraArgs: - seccomp-default: "true" -``` - -If the cluster is ready, then running a pod: - -```shell -kubectl run --rm -it --restart=Never --image=alpine alpine -- sh -``` - -Should now have the default seccomp profile attached. This can be verified by -using `docker exec` to run `crictl inspect` for the container on the kind -worker: - -```shell -docker exec -it kind-worker bash -c \ - 'crictl inspect $(crictl ps --name=alpine -q) | jq .info.runtimeSpec.linux.seccomp' -``` - -```json -{ - "defaultAction": "SCMP_ACT_ERRNO", - "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"], - "syscalls": [ - { - "names": ["..."] - } - ] -} -``` - -## Create Pod that uses the container runtime default seccomp profile +## Create a Pod that uses the container runtime default seccomp profile Most container runtimes provide a sane set of default syscalls that are allowed or not. You can adopt these defaults for your workload by setting the seccomp @@ -290,7 +193,7 @@ NAME READY STATUS RESTARTS AGE default-pod 1/1 Running 0 20s ``` -Finally, now that you saw that work OK, clean up: +Delete the Pod before moving to the next section: ```shell kubectl delete pod default-pod --wait --now @@ -323,7 +226,7 @@ This profile does not restrict any syscalls, so the Pod should start successfully. ```shell -kubectl get pod/audit-pod +kubectl get pod audit-pod ``` ``` @@ -332,7 +235,7 @@ audit-pod 1/1 Running 0 30s ``` In order to be able to interact with this endpoint exposed by this -container, create a NodePort {{< glossary_tooltip text="Services" term_id="service" >}} +container, create a NodePort {{< glossary_tooltip text="Service" term_id="service" >}} that allows access to the endpoint from inside the kind control plane container. ```shell @@ -356,7 +259,7 @@ at the port exposed by this Service. Use `docker exec` to run the `curl` command container belonging to that control plane container: ```shell -# Change 6a96207fed4b to the control plane container ID you saw from "docker ps" +# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw from "docker ps" docker exec -it 6a96207fed4b curl localhost:32373 ``` @@ -366,15 +269,16 @@ just made some syscalls! You can see that the process is running, but what syscalls did it actually make? Because this Pod is running in a local cluster, you should be able to see those -in `/var/log/syslog`. Open up a new terminal window and `tail` the output for +in `/var/log/syslog` on your local system. Open up a new terminal window and `tail` the output for calls from `http-echo`: ```shell +# The log path on your computer might be different from "/var/log/syslog" tail -f /var/log/syslog | grep 'http-echo' ``` -You should already see some logs of syscalls made by `http-echo`, and if you -`curl` the endpoint in the control plane container you will see more written. +You should already see some logs of syscalls made by `http-echo`, and if you run `curl` again inside +the control plane container you will see more output written to the log. For example: ``` @@ -393,14 +297,14 @@ looking at the `syscall=` entry on each line. While these are unlikely to encompass all syscalls it uses, it can serve as a basis for a seccomp profile for this container. -Clean up that Pod and Service before moving to the next section: +Delete the Service and the Pod before moving to the next section: ```shell kubectl delete service audit-pod --wait kubectl delete pod audit-pod --wait --now ``` -## Create Pod with a seccomp profile that causes violation +## Create a Pod with a seccomp profile that causes violation For demonstration, apply a profile to the Pod that does not allow for any syscalls. @@ -419,7 +323,7 @@ The Pod creates, but there is an issue. If you check the status of the Pod, you should see that it failed to start. ```shell -kubectl get pod/violation-pod +kubectl get pod violation-pod ``` ``` @@ -433,13 +337,13 @@ syscalls. Here seccomp has been instructed to error on any syscall by setting ability to do anything meaningful. What you really want is to give workloads only the privileges they need. -Clean up that Pod before moving to the next section: +Delete the Pod before moving to the next section: ```shell kubectl delete pod violation-pod --wait --now ``` -## Create Pod with a seccomp profile that only allows necessary syscalls +## Create a Pod with a seccomp profile that only allows necessary syscalls If you take a look at the `fine-grained.json` profile, you will notice some of the syscalls seen in syslog of the first example where the profile set `"defaultAction": @@ -497,7 +401,7 @@ fine-pod NodePort 10.111.36.142 5678:32373/TCP 72s Use `curl` to access that endpoint from inside the kind control plane container: ```shell -# Change 6a96207fed4b to the control plane container ID you saw from "docker ps" +# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw from "docker ps" docker exec -it 6a96207fed4b curl localhost:32373 ``` @@ -511,13 +415,117 @@ the list is invoked. This is an ideal situation from a security perspective, but required some effort in analyzing the program. It would be nice if there was a simple way to get closer to this security without requiring as much effort. -Clean up that Pod and Service before moving to the next section: +Delete the Service and the Pod before moving to the next section: ```shell kubectl delete service fine-pod --wait kubectl delete pod fine-pod --wait --now ``` +## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads + +{{< feature-state state="stable" for_k8s_version="v1.27" >}} + +To use seccomp profile defaulting, you must run the kubelet with the +`--seccomp-default` +[command line flag](/docs/reference/command-line-tools-reference/kubelet) +enabled for each node where you want to use it. + +If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is +defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode. +The default profiles aim to provide a strong set +of security defaults while preserving the functionality of the workload. It is +possible that the default profiles differ between container runtimes and their +release versions, for example when comparing those from CRI-O and containerd. + +{{< note >}} +Enabling the feature will neither change the Kubernetes +`securityContext.seccompProfile` API field nor add the deprecated annotations of +the workload. This provides users the possibility to rollback anytime without +actually changing the workload configuration. Tools like +[`crictl inspect`](https://github.com/kubernetes-sigs/cri-tools) can be used to +verify which seccomp profile is being used by a container. +{{< /note >}} + +Some workloads may require a lower amount of syscall restrictions than others. +This means that they can fail during runtime even with the `RuntimeDefault` +profile. To mitigate such a failure, you can: + +- Run the workload explicitly as `Unconfined`. +- Disable the `SeccompDefault` feature for the nodes. Also making sure that + workloads get scheduled on nodes where the feature is disabled. +- Create a custom seccomp profile for the workload. + +If you were introducing this feature into production-like cluster, the Kubernetes project +recommends that you enable this feature gate on a subset of your nodes and then +test workload execution before rolling the change out cluster-wide. + +You can find more detailed information about a possible upgrade and downgrade strategy +in the related Kubernetes Enhancement Proposal (KEP): +[Enable seccomp by default](https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy). + +Kubernetes {{< skew currentVersion >}} lets you configure the seccomp profile +that applies when the spec for a Pod doesn't define a specific seccomp profile. +However, you still need to enable this defaulting for each node where you would +like to use it. + +If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to +enable the feature, either run the kubelet with the `--seccomp-default` command +line flag, or enable it through the [kubelet configuration +file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the +feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides +the minimum required Kubernetes version and enables the `SeccompDefault` feature +[in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster): + +```yaml +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 +nodes: + - role: control-plane + image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac + kubeadmConfigPatches: + - | + kind: JoinConfiguration + nodeRegistration: + kubeletExtraArgs: + seccomp-default: "true" + - role: worker + image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac + kubeadmConfigPatches: + - | + kind: JoinConfiguration + nodeRegistration: + kubeletExtraArgs: + seccomp-default: "true" +``` + +If the cluster is ready, then running a pod: + +```shell +kubectl run --rm -it --restart=Never --image=alpine alpine -- sh +``` + +Should now have the default seccomp profile attached. This can be verified by +using `docker exec` to run `crictl inspect` for the container on the kind +worker: + +```shell +docker exec -it kind-worker bash -c \ + 'crictl inspect $(crictl ps --name=alpine -q) | jq .info.runtimeSpec.linux.seccomp' +``` + +```json +{ + "defaultAction": "SCMP_ACT_ERRNO", + "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"], + "syscalls": [ + { + "names": ["..."] + } + ] +} +``` + ## {{% heading "whatsnext" %}} You can learn more about Linux seccomp: