Cluster was created with the following command:
kcli create kube generic -P masters=1 -P workers=1 -P master_memory=4096 -P numcpus=2 -P worker_memory=4096 -P sdn=calico -P version=1.24 -P ingress=true -P ingress_method=nginx -P metallb=true -P engine=crio -P domain=linuxera.org caps-cluster
-
Create a namespace
NAMESPACE=test-capabilities kubectl create ns ${NAMESPACE}
-
Create a pod running our application with UID 0:
cat <<EOF | kubectl -n ${NAMESPACE} create -f - apiVersion: v1 kind: Pod metadata: name: reversewords-app-captest-root spec: containers: - image: quay.io/mavazque/reversewords:ubi8 name: reversewords securityContext: runAsUser: 0 dnsPolicy: ClusterFirst restartPolicy: Never status: {} EOF
-
Let's review the thread capability sets:
kubectl -n ${NAMESPACE} exec -ti reversewords-app-captest-root -- grep Cap /proc/1/status
-
We can see that the permitted and effective set have some capabilities, if we decode them this is what we get:
CapInh: 00000000000005fb CapPrm: 00000000000005fb CapEff: 00000000000005fb CapBnd: 00000000000005fb CapAmb: 0000000000000000
$ capsh --decode=00000000000005fb 0x00000000000005fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service
This are the default capabilities in CRI-O 1.23 which matches the one shown previously
default_capabilities = [ "CHOWN", "DAC_OVERRIDE", "FSETID", "FOWNER", "SETGID", "SETUID", "SETPCAP", "NET_BIND_SERVICE", "KILL", ]
-
Now, let's run the same application pod but with a nonroot UID:
cat <<EOF | kubectl -n ${NAMESPACE} create -f - apiVersion: v1 kind: Pod metadata: name: reversewords-app-captest-nonroot spec: containers: - image: quay.io/mavazque/reversewords:ubi8 name: reversewords securityContext: runAsUser: 1024 dnsPolicy: ClusterFirst restartPolicy: Never status: {} EOF
-
If we review the thread capability sets this is what we get:
kubectl -n ${NAMESPACE} exec -ti reversewords-app-captest-nonroot -- grep Cap /proc/1/status CapInh: 00000000000005fb CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 00000000000005fb CapAmb: 0000000000000000
-
The permitted and effective sets got cleared, if you remember this is expected. The problem on Kube is that it doesn't support ambient capabilities, as you can see the ambient set is cleared. That leaves us only with two options: File caps or caps aware apps.
- In this first deployment we are going to run our app with root uid and drop every runtime capability but NET_BIND_SERVICE.
⚠️ Notice that if you are running OpenShift you will need to add the proper SCC to the default service account of the namespace. By default OpenShift does not allow to manage capabilities in your deployments. In order to fix that you can for instance runoc adm policy add-scc-to-user privileged -z default -n $NAMESPACE
. That line allows using the privileged SCC to thedefault
serviceaccount in the namespace you are running the app.
~~~sh
cat <<EOF | kubectl -n ${NAMESPACE} create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: reversewords-app-rootuid
name: reversewords-app-rootuid
spec:
replicas: 1
selector:
matchLabels:
app: reversewords-app-rootuid
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: reversewords-app-rootuid
spec:
containers:
- image: quay.io/mavazque/reversewords:ubi8
name: reversewords
resources: {}
env:
- name: APP_PORT
value: "80"
securityContext:
runAsUser: 0
capabilities:
drop:
- all
add:
- NET_BIND_SERVICE
status: {}
EOF
~~~
-
If we get the application logs we can see that it started properlly:
kubectl -n ${NAMESPACE} logs deployment/reversewords-app-rootuid 2022/07/06 15:14:51 Starting Reverse Api v0.0.21 Release: NotSet 2022/07/06 15:14:51 Listening on port 80
-
If we look at the capability sets this is what we get:
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-rootuid -- grep Cap /proc/1/status CapInh: 0000000000000400 CapPrm: 0000000000000400 CapEff: 0000000000000400 CapBnd: 0000000000000400 CapAmb: 0000000000000000
-
We have the NET_BIND_SERVICE available in the effective and permitted so it worked as expected.
-
Now, we are dropping all of the runtime’s default capabilities, on top of that we add the NET_BIND_SERVICE capability and request the app to run with non-root UID. In the environment variables we configure our app to listen on port 80.
cat <<EOF | kubectl -n ${NAMESPACE} create -f - apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: reversewords-app-nonrootuid name: reversewords-app-nonrootuid spec: replicas: 1 selector: matchLabels: app: reversewords-app-nonrootuid strategy: {} template: metadata: creationTimestamp: null labels: app: reversewords-app-nonrootuid spec: containers: - image: quay.io/mavazque/reversewords:ubi8 name: reversewords resources: {} env: - name: APP_PORT value: "80" securityContext: runAsUser: 1024 capabilities: drop: - all add: - NET_BIND_SERVICE status: {} EOF
-
Let's check the logs:
kubectl -n ${NAMESPACE} logs deployment/reversewords-app-nonrootuid 2022/07/06 15:17:11 Starting Reverse Api v0.0.21 Release: NotSet 2022/07/06 15:17:11 Listening on port 80 2022/07/06 15:17:11 listen tcp :80: bind: permission denied
-
The application failed to bind to port 80, let's update the confiuration so we can access the pod an check the capability sets:
# Patch the app so it binds to port 8080 kubectl -n ${NAMESPACE} patch deployment reversewords-app-nonrootuid -p '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"reversewords"}],"containers":[{"$setElementOrder/env":[{"name":"APP_PORT"}],"env":[{"name":"APP_PORT","value":"8080"}],"name":"reversewords"}]}}}}' # Get capability sets kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-nonrootuid -- grep Cap /proc/1/status CapInh: 0000000000000400 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000000000000400 CapAmb: 0000000000000000
-
We don't have the NET_BIND_SERVICE in the
effective
andpermitted
set, that means that in order for this to work we will need the capability to be in the ambient set, but this is not supported yet on Kubernetes, we will need to make us of file capabilities.
-
We have an image with the file capabilities configured, let's update the previous deployment to use port 80 and this new image:
kubectl -n ${NAMESPACE} patch deployment reversewords-app-nonrootuid -p '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"reversewords"}],"containers":[{"$setElementOrder/env":[{"name":"APP_PORT"}],"env":[{"name":"APP_PORT","value":"80"}],"image":"quay.io/mavazque/reversewords-captest:latest","name":"reversewords"}]}}}}'
-
Let's check the logs for the app:
kubectl -n ${NAMESPACE} logs deployment/reversewords-app-nonrootuid 2022/07/06 15:26:30 Starting Reverse Api v0.0.21 Release: NotSet 2022/07/06 15:26:30 Listening on port 80
-
If we check the capabilities now this is what we get. Permitted and effective set have acquired the capability requested:
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-nonrootuid -- grep Cap /proc/1/status CapInh: 0000000000000400 CapPrm: 0000000000000400 CapEff: 0000000000000400 CapBnd: 0000000000000400 CapAmb: 0000000000000000
-
We can check the file capabilities configured in our binary as well:
kubectl -n ${NAMESPACE} exec -ti deployment/reversewords-app-nonrootuid -- getcap /usr/bin/reverse-words /usr/bin/reverse-words = cap_net_bind_service+eip
❗Notice that the same result will occur if we set the file capabilities to inherited and effective or permitted and effective. See Transformation of capabilities during execve
//// 5. Now, lets set the AllowPrivilegeEscalation to false in the securityContext spec of the deployment and check the status:
~~~sh
kubectl -n ${NAMESPACE} patch deployment reversewords-app-nonrootuid -p '{"spec":{"template":{"spec":{"$setElementOrder/containers": [{"name":"reversewords"}],"containers":[{"name":"reversewords","securityContext":{"allowPrivilegeEscalation":false}}]}}}}'
~~~