Skip to content

Commit

Permalink
Merge pull request #41962 from mrgiles/31536_seccomp_tutorial_add_pre…
Browse files Browse the repository at this point in the history
…reqs

Add docker prereq, mention podman and rearrange sections
  • Loading branch information
k8s-ci-robot authored Jul 31, 2023
2 parents f0f9b32 + ee171ac commit 2692867
Showing 1 changed file with 127 additions and 119 deletions.
246 changes: 127 additions & 119 deletions content/en/docs/tutorials/security/seccomp.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ profiles that give only the necessary privileges to your container processes.
In order to complete all steps in this tutorial, you must install
[kind](/docs/tasks/tools/#kind) and [kubectl](/docs/tasks/tools/#kubectl).

The commands used in the tutorial assume that you are using
[Docker](https://www.docker.com/) as your container runtime. (The cluster that `kind` creates may
use a different container runtime internally). You could also use
[Podman](https://podman.io/) but in that case, you would have to follow specific
[instructions](https://kind.sigs.k8s.io/docs/user/rootless/) in order to complete the tasks
successfully.

This tutorial shows some examples that are still beta (since v1.25) and
others that use only generally available seccomp functionality. You should
make sure that your cluster is
Expand Down Expand Up @@ -154,111 +161,7 @@ audit.json fine-grained.json violation.json
You have verified that these seccomp profiles are available to the kubelet
running within kind.

## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads

{{< feature-state state="stable" for_k8s_version="v1.27" >}}

To use seccomp profile defaulting, you must run the kubelet with the
`--seccomp-default`
[command line flag](/docs/reference/command-line-tools-reference/kubelet)
enabled for each node where you want to use it.

If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is
defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode.
The default profiles aim to provide a strong set
of security defaults while preserving the functionality of the workload. It is
possible that the default profiles differ between container runtimes and their
release versions, for example when comparing those from CRI-O and containerd.

{{< note >}}
Enabling the feature will neither change the Kubernetes
`securityContext.seccompProfile` API field nor add the deprecated annotations of
the workload. This provides users the possibility to rollback anytime without
actually changing the workload configuration. Tools like
[`crictl inspect`](https://github.com/kubernetes-sigs/cri-tools) can be used to
verify which seccomp profile is being used by a container.
{{< /note >}}

Some workloads may require a lower amount of syscall restrictions than others.
This means that they can fail during runtime even with the `RuntimeDefault`
profile. To mitigate such a failure, you can:

- Run the workload explicitly as `Unconfined`.
- Disable the `SeccompDefault` feature for the nodes. Also making sure that
workloads get scheduled on nodes where the feature is disabled.
- Create a custom seccomp profile for the workload.

If you were introducing this feature into production-like cluster, the Kubernetes project
recommends that you enable this feature gate on a subset of your nodes and then
test workload execution before rolling the change out cluster-wide.

You can find more detailed information about a possible upgrade and downgrade strategy
in the related Kubernetes Enhancement Proposal (KEP):
[Enable seccomp by default](https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy).

Kubernetes {{< skew currentVersion >}} lets you configure the seccomp profile
that applies when the spec for a Pod doesn't define a specific seccomp profile.
However, you still need to enable this defaulting for each node where you would
like to use it.

If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to
enable the feature, either run the kubelet with the `--seccomp-default` command
line flag, or enable it through the [kubelet configuration
file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the
feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides
the minimum required Kubernetes version and enables the `SeccompDefault` feature
[in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster):

```yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
seccomp-default: "true"
- role: worker
image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
seccomp-default: "true"
```
If the cluster is ready, then running a pod:
```shell
kubectl run --rm -it --restart=Never --image=alpine alpine -- sh
```

Should now have the default seccomp profile attached. This can be verified by
using `docker exec` to run `crictl inspect` for the container on the kind
worker:

```shell
docker exec -it kind-worker bash -c \
'crictl inspect $(crictl ps --name=alpine -q) | jq .info.runtimeSpec.linux.seccomp'
```

```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
"syscalls": [
{
"names": ["..."]
}
]
}
```

## Create Pod that uses the container runtime default seccomp profile
## Create a Pod that uses the container runtime default seccomp profile

Most container runtimes provide a sane set of default syscalls that are allowed
or not. You can adopt these defaults for your workload by setting the seccomp
Expand Down Expand Up @@ -290,7 +193,7 @@ NAME READY STATUS RESTARTS AGE
default-pod 1/1 Running 0 20s
```

Finally, now that you saw that work OK, clean up:
Delete the Pod before moving to the next section:

```shell
kubectl delete pod default-pod --wait --now
Expand Down Expand Up @@ -323,7 +226,7 @@ This profile does not restrict any syscalls, so the Pod should start
successfully.

```shell
kubectl get pod/audit-pod
kubectl get pod audit-pod
```

```
Expand All @@ -332,7 +235,7 @@ audit-pod 1/1 Running 0 30s
```

In order to be able to interact with this endpoint exposed by this
container, create a NodePort {{< glossary_tooltip text="Services" term_id="service" >}}
container, create a NodePort {{< glossary_tooltip text="Service" term_id="service" >}}
that allows access to the endpoint from inside the kind control plane container.

```shell
Expand All @@ -356,7 +259,7 @@ at the port exposed by this Service. Use `docker exec` to run the `curl` command
container belonging to that control plane container:

```shell
# Change 6a96207fed4b to the control plane container ID you saw from "docker ps"
# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw from "docker ps"
docker exec -it 6a96207fed4b curl localhost:32373
```

Expand All @@ -366,15 +269,16 @@ just made some syscalls!

You can see that the process is running, but what syscalls did it actually make?
Because this Pod is running in a local cluster, you should be able to see those
in `/var/log/syslog`. Open up a new terminal window and `tail` the output for
in `/var/log/syslog` on your local system. Open up a new terminal window and `tail` the output for
calls from `http-echo`:

```shell
# The log path on your computer might be different from "/var/log/syslog"
tail -f /var/log/syslog | grep 'http-echo'
```

You should already see some logs of syscalls made by `http-echo`, and if you
`curl` the endpoint in the control plane container you will see more written.
You should already see some logs of syscalls made by `http-echo`, and if you run `curl` again inside
the control plane container you will see more output written to the log.

For example:
```
Expand All @@ -393,14 +297,14 @@ looking at the `syscall=` entry on each line. While these are unlikely to
encompass all syscalls it uses, it can serve as a basis for a seccomp profile
for this container.

Clean up that Pod and Service before moving to the next section:
Delete the Service and the Pod before moving to the next section:

```shell
kubectl delete service audit-pod --wait
kubectl delete pod audit-pod --wait --now
```

## Create Pod with a seccomp profile that causes violation
## Create a Pod with a seccomp profile that causes violation

For demonstration, apply a profile to the Pod that does not allow for any
syscalls.
Expand All @@ -419,7 +323,7 @@ The Pod creates, but there is an issue.
If you check the status of the Pod, you should see that it failed to start.

```shell
kubectl get pod/violation-pod
kubectl get pod violation-pod
```

```
Expand All @@ -433,13 +337,13 @@ syscalls. Here seccomp has been instructed to error on any syscall by setting
ability to do anything meaningful. What you really want is to give workloads
only the privileges they need.

Clean up that Pod before moving to the next section:
Delete the Pod before moving to the next section:

```shell
kubectl delete pod violation-pod --wait --now
```

## Create Pod with a seccomp profile that only allows necessary syscalls
## Create a Pod with a seccomp profile that only allows necessary syscalls

If you take a look at the `fine-grained.json` profile, you will notice some of the syscalls
seen in syslog of the first example where the profile set `"defaultAction":
Expand Down Expand Up @@ -497,7 +401,7 @@ fine-pod NodePort 10.111.36.142 <none> 5678:32373/TCP 72s
Use `curl` to access that endpoint from inside the kind control plane container:

```shell
# Change 6a96207fed4b to the control plane container ID you saw from "docker ps"
# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw from "docker ps"
docker exec -it 6a96207fed4b curl localhost:32373
```

Expand All @@ -511,13 +415,117 @@ the list is invoked. This is an ideal situation from a security perspective, but
required some effort in analyzing the program. It would be nice if there was a
simple way to get closer to this security without requiring as much effort.

Clean up that Pod and Service before moving to the next section:
Delete the Service and the Pod before moving to the next section:

```shell
kubectl delete service fine-pod --wait
kubectl delete pod fine-pod --wait --now
```

## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads

{{< feature-state state="stable" for_k8s_version="v1.27" >}}

To use seccomp profile defaulting, you must run the kubelet with the
`--seccomp-default`
[command line flag](/docs/reference/command-line-tools-reference/kubelet)
enabled for each node where you want to use it.

If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is
defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode.
The default profiles aim to provide a strong set
of security defaults while preserving the functionality of the workload. It is
possible that the default profiles differ between container runtimes and their
release versions, for example when comparing those from CRI-O and containerd.

{{< note >}}
Enabling the feature will neither change the Kubernetes
`securityContext.seccompProfile` API field nor add the deprecated annotations of
the workload. This provides users the possibility to rollback anytime without
actually changing the workload configuration. Tools like
[`crictl inspect`](https://github.com/kubernetes-sigs/cri-tools) can be used to
verify which seccomp profile is being used by a container.
{{< /note >}}

Some workloads may require a lower amount of syscall restrictions than others.
This means that they can fail during runtime even with the `RuntimeDefault`
profile. To mitigate such a failure, you can:

- Run the workload explicitly as `Unconfined`.
- Disable the `SeccompDefault` feature for the nodes. Also making sure that
workloads get scheduled on nodes where the feature is disabled.
- Create a custom seccomp profile for the workload.

If you were introducing this feature into production-like cluster, the Kubernetes project
recommends that you enable this feature gate on a subset of your nodes and then
test workload execution before rolling the change out cluster-wide.

You can find more detailed information about a possible upgrade and downgrade strategy
in the related Kubernetes Enhancement Proposal (KEP):
[Enable seccomp by default](https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#upgrade--downgrade-strategy).

Kubernetes {{< skew currentVersion >}} lets you configure the seccomp profile
that applies when the spec for a Pod doesn't define a specific seccomp profile.
However, you still need to enable this defaulting for each node where you would
like to use it.

If you are running a Kubernetes {{< skew currentVersion >}} cluster and want to
enable the feature, either run the kubelet with the `--seccomp-default` command
line flag, or enable it through the [kubelet configuration
file](/docs/tasks/administer-cluster/kubelet-config-file/). To enable the
feature gate in [kind](https://kind.sigs.k8s.io), ensure that `kind` provides
the minimum required Kubernetes version and enables the `SeccompDefault` feature
[in the kind configuration](https://kind.sigs.k8s.io/docs/user/quick-start/#enable-feature-gates-in-your-cluster):

```yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
seccomp-default: "true"
- role: worker
image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
seccomp-default: "true"
```
If the cluster is ready, then running a pod:
```shell
kubectl run --rm -it --restart=Never --image=alpine alpine -- sh
```

Should now have the default seccomp profile attached. This can be verified by
using `docker exec` to run `crictl inspect` for the container on the kind
worker:

```shell
docker exec -it kind-worker bash -c \
'crictl inspect $(crictl ps --name=alpine -q) | jq .info.runtimeSpec.linux.seccomp'
```

```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
"syscalls": [
{
"names": ["..."]
}
]
}
```

## {{% heading "whatsnext" %}}

You can learn more about Linux seccomp:
Expand Down

0 comments on commit 2692867

Please sign in to comment.