Kubearmor fetches wrong pod name/namespace to apply whitelist policy on long running GKE clusters (BPF-LSM enforcement) #1780

gusfcarvalho · 2024-06-10T14:38:00Z

Bug Report

General Information

Issue

This is a consistent issue when running kubearmor on any long-lived cluster. We have a set of policies in protected-namespace where we whitelist only a few pods based on labels:

Config

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: default-deny
  namespace: protected-namespace
spec:
  action: Block
  process:
    matchDirectories:
    - dir: /
      recursive: true
  selector:
    matchLabels: {}
---
apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: allow-app
  namespace: protected-namespace
spec:
  action: Allow
  file:
    matchDirectories:
    - dir: /
      recursive: true
  process:
    matchDirectories:
    ## list of matchDirectories here
    matchPaths:
    ## list of allowed paths here
  selector:
    matchLabels:
      kubearmor-whitelist-profile: app

Symptom

A pod from unprotected-namespace cannot run due to not being able to run correct-binary: permission-denied

Extra information

With this configuration, after a time with kubearmor running on cluster, kubearmor starts to deny applications in unprotected-namespaces as well, even though there are no KubearmorPolicy objects on these namespace:

From karmor logs, I could see:

{
"Timestamp":1718029258,
"UpdatedTime":"2024-06-10T14:20:58.078820Z",
"ClusterName":"default",
"HostName":"***-bz88",
"NamespaceName":"protected",
"Owner":{"Ref":"Pod","Name":"wrong-pod-name","Namespace":"protected"},
"PodName":"wrong-pod-name",
"Labels":"wrong-pod-labels",
"ContainerID":"wrong-id",
"ContainerName":"wrong-container",
"ContainerImage":"wrong-image",
"ProcessName":"/binary-from-unprotected-pod",
"PolicyName":"default-deny",
"Severity":"1",
"Type":"MatchedPolicy",
"Operation":"Process",
"Resource":"/binary-from-unprotected-pod",
"Data":"lsm=SECURITY_BPRM_CHECK",
"Enforcer":"BPFLSM",
"Action":"Block",
"Result":"Permission denied",
"Cwd":"/"
}

This is recurring on any clsuter we have kubearmor operator running. As a workaround, if we kubectl delete pods --all -n <kubearmor-namespace>, the system goes back to running as expected (until a few days/weeks later, the issue restarts).

Versions

Kubearmor: v1.3.4
Cluster: gke 1.27.13-gke.1000000

Expected behavior

I would expect kubearmor to consistently get the information of the correct pods' names and namespaces :).

The text was updated successfully, but these errors were encountered:

daemon1024 · 2024-06-11T06:55:07Z

Hey @gusfcarvalho

I want to try reproducing this issue, can you inform me about the scale of the cluster like number of nodes and number of pods/node. So that we can replicate this scenario, because this is not reproducible in our normal test clusters.

If by chance possible, Can you redact sensitive information and share logs for kubearmor pods.

Thanks!

gusfcarvalho · 2024-06-11T11:14:06Z

we see this issue with a cluster with 8/9 nodes; running about 60 pods across 10 namespaces. We also see it on bigger clusters as well.

The main pain point is that it works on a 'fresh' pod - it takes about a few days or so for the issue to kick in (so I would expect it to not be seen on any e2e tests)

gusfcarvalho · 2024-06-11T11:14:52Z

If by chance possible, Can you redact sensitive information and share logs for kubearmor pods.

sure! this issue is so recurring that's actually easy to fetch them. Do you need logs from kubearmor-bpf-containerd only?

from what I can see, pod logs only contain several of:

2024-06-11 05:24:33.556758      INFO    Detected a Pod (deleted/<>)
2024-06-11 05:24:42.843611      INFO    Successfully deleted visibility map with key={PidNS:40265*** MntNS:40265***} from the kernel
2024-06-11 05:24:42.850623      INFO    Detected a container (<>)
2024-06-11 05:25:46.128971      INFO    Updating container rules for aeb***
2024-06-11 05:25:47.422789      INFO    Detected a Pod (modified/<>)

the only information that I can see is performing wrong is with karmor. My guess is somehow the keys are getting mismatched.

gusfcarvalho · 2024-06-19T12:43:47Z

This issue is still persisting. Any updates? Feel free to dm me in kubernetes slack on @gusfcarvalho

gusfcarvalho · 2024-06-28T09:59:12Z

Hello! 😄 any updates on this?

carlosrodfern · 2024-08-12T22:45:00Z

I'm having a similar issue. I have a policy with a selector matching specific labels within a specific namespace. Two days later, I began to see relay alert logs pointing to policy violations in a different namespace, and on containers with no labels matching the selector. I'm running kubearmor v1.4.0, and on 17 nodes at his moment. On AWS EKS.

carlosrodfern · 2024-08-13T16:37:14Z

I deleted all the pods in the kube armor namespace, and some hours later, it was already misapplying policies.

A very simple policy:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: generic-maint-tools-access
  namespace: myapplication
spec:
  action: Audit
  message: Restricted maintenance tool access attempt detected
  process:
    matchDirectories:
      - dir: /sbin/
        recursive: true
      - dir: /usr/sbin/
        recursive: true
  selector:
    matchLabels:
      security: generic
  severity: 1
  tags:
    - PCI_DSS
    - MITRE
    - MITRE_T1553_Subvert_Trust_Controls

After recreating the pods, all bpf-containerd pods log this:

Detected a Security Policy (added/myapplication/generic-maint-tools-access)

However, a few hours later, I get these logs:

{
...
  "NamespaceName": "anothernamespace",
  "Owner":
    { "Ref": "StatefulSet", "Name": "somests", "Namespace": "anothernamespace" },
  "PodName": "somepod-2",
  "Labels": "...",
  "ContainerID": "...",
  "ContainerName": "somecontainername",
  "ContainerImage": "...",
  "HostPPID": ...,
  "HostPID": ...,
  "PPID": ...,
  "PID": ...,
  "UID": ...,
  "ParentProcessName": "...",
  "ProcessName": "...",
  "PolicyName": "generic-maint-tools-access",
  "Severity": "1",
  "Tags": "PCI_DSS,MITRE,MITRE_T1553_Subvert_Trust_Controls",
  "ATags": ["PCI_DSS", "MITRE", "MITRE_T1553_Subvert_Trust_Controls"],
  "Message": "Restricted maintenance tool access attempt detected",
  "Type": "MatchedPolicy",
  "Source": "....",
  "Operation": "Process",
  "Resource": "...",
  "Data": "syscall=SYS_EXECVE",
  "Enforcer": "eBPF Monitor",
  "Action": "Audit",
  "Result": "Passed",
  "Cwd": "/",
  ...
}

I have no other policy named generic-maint-tools-access in any other namespace, or cluster policy named like that.

carlosrodfern · 2024-08-13T17:21:33Z

It looks like the bug is right here?

KubeArmor/KubeArmor/core/kubeUpdate.go

Line 1016 in 7e7b1c3

    
           if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)

...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response.
It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.

Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?

I may be missing something.

carlosrodfern · 2024-08-13T18:06:04Z

It looks like the bug is right here?

KubeArmor/KubeArmor/core/kubeUpdate.go

Line 1016 in 7e7b1c3

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)

...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.

Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?

I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.

I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

carlosrodfern · 2024-08-13T19:08:00Z

It looks like the bug is right here?

KubeArmor/KubeArmor/core/kubeUpdate.go

Line 1016 in 7e7b1c3

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?
I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.

I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

I passed the build issues and created an image with this proposed fix, and deployed it internally. I'll report back if that solves my scenario.

Regular policies are mistakenly applied at cluster level over time Fixes: kubearmor#1780 Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>

Regular policies are mistakenly modified and applied at cluster level over time Fixes: kubearmor#1780 Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>

Regular policies are mistakenly modified and applied at cluster level over time. The `if` condition in `GetSecurityPolicies(..)` returns true if `matchClusterSecurityPolicyRule(..)` evaluates to `true`. That function doesn't check whether the passed policy is a cluster policy, and when the `matchExpressions` is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns `true` and the policy is added to the `GetSecurityPolicies(..)` response. Over time, as `matchClusterSecurityPolicyRule(..)` is called, the list of `NamespaceList` in each policy keeps increasing, by one ns at a time, which explains the fact that it may take time to display this behavior. The cluster policies are built already with the `NamespaceList` properly initialized, so the function `matchClusterSecurityPolicyRule(..)` is just removed, letting the `|| kl.ContainsElement(..)` already present in the `if` condition do the cluster policy matching. Fixes: kubearmor#1780 Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>

Prateeknandle · 2024-08-14T08:27:57Z

It looks like the bug is right here?

KubeArmor/KubeArmor/core/kubeUpdate.go

Line 1016 in 7e7b1c3

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?
I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.

I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

Hey @carlosrodfern thnx for the detailed explanation here. You are correct there is bug in this check matchClusterSecurityPolicyRule bcz it is getting executed for ksp(KubeArmorSecurityPolicy) as well and it will indeed return true in the case you mentioned. But we cannot remove this check because there is no namespace watcher exists, which will update the NamespaceList when a new namespace is created after applying the cluster policy. In the case of NotIn operator it is important to update the NamespaceList if a new namespace is added/created.

Thus we will not recommend to remove the check, rather add a condition in matchClusterSecurityPolicyRule to check if the policy is of ksp type, if it is then we will return false early.

BTW on a side note I don't think this was the bug that was causing the problem for @gusfcarvalho, bcz he is using v1.3.4 which does not have these changes, these are being added later and are available in v1.4.0

carlosrodfern · 2024-08-14T14:16:18Z

It looks like the bug is right here?

KubeArmor/KubeArmor/core/kubeUpdate.go

Line 1016 in 7e7b1c3

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?
I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.
I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

Hey @carlosrodfern thnx for the detailed explanation here. You are correct there is bug in this check matchClusterSecurityPolicyRule bcz it is getting executed for ksp(KubeArmorSecurityPolicy) as well and it will indeed return true in the case you mentioned. But we cannot remove this check because there is no namespace watcher exists, which will update the NamespaceList when a new namespace is created after applying the cluster policy. In the case of NotIn operator it is important to update the NamespaceList if a new namespace is added/created.

Thus we will not recommend to remove the check, rather add a condition in matchClusterSecurityPolicyRule to check if the policy is of ksp type, if it is then we will return false early.

BTW on a side note I don't think this was the bug that was causing the problem for @gusfcarvalho, bcz he is using v1.3.4 which does not have these changes, these are being added later and are available in v1.4.0

Thank you @Prateeknandle for looking into this. I'll be creating a separate issue and correcting the PR.

gusfcarvalho added the bug Something isn't working label Jun 10, 2024

daemon1024 added this to Chore Cleanup Jun 13, 2024

daemon1024 moved this to Barun in Chore Cleanup Jun 13, 2024

carlosrodfern added a commit to carlosrodfern/KubeArmor that referenced this issue Aug 13, 2024

fix policies mismatching

c29fcc5

Regular policies are mistakenly applied at cluster level over time Fixes: kubearmor#1780 Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>

carlosrodfern mentioned this issue Aug 13, 2024

fix(core): ensure only cluster policy is updated on new ns #1837

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubearmor fetches wrong pod name/namespace to apply whitelist policy on long running GKE clusters (BPF-LSM enforcement) #1780

Kubearmor fetches wrong pod name/namespace to apply whitelist policy on long running GKE clusters (BPF-LSM enforcement) #1780

gusfcarvalho commented Jun 10, 2024 •

edited

Loading

daemon1024 commented Jun 11, 2024

gusfcarvalho commented Jun 11, 2024 •

edited

Loading

gusfcarvalho commented Jun 11, 2024 •

edited

Loading

gusfcarvalho commented Jun 19, 2024

gusfcarvalho commented Jun 28, 2024

carlosrodfern commented Aug 12, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024

Prateeknandle commented Aug 14, 2024 •

edited

Loading

carlosrodfern commented Aug 14, 2024

Kubearmor fetches wrong pod name/namespace to apply whitelist policy on long running GKE clusters (BPF-LSM enforcement) #1780

Kubearmor fetches wrong pod name/namespace to apply whitelist policy on long running GKE clusters (BPF-LSM enforcement) #1780

Comments

gusfcarvalho commented Jun 10, 2024 • edited Loading

Bug Report

Issue

Config

Symptom

Extra information

Versions

daemon1024 commented Jun 11, 2024

gusfcarvalho commented Jun 11, 2024 • edited Loading

gusfcarvalho commented Jun 11, 2024 • edited Loading

gusfcarvalho commented Jun 19, 2024

gusfcarvalho commented Jun 28, 2024

carlosrodfern commented Aug 12, 2024 • edited Loading

carlosrodfern commented Aug 13, 2024 • edited Loading

carlosrodfern commented Aug 13, 2024 • edited Loading

carlosrodfern commented Aug 13, 2024 • edited Loading

carlosrodfern commented Aug 13, 2024

Prateeknandle commented Aug 14, 2024 • edited Loading

carlosrodfern commented Aug 14, 2024

gusfcarvalho commented Jun 10, 2024 •

edited

Loading

gusfcarvalho commented Jun 11, 2024 •

edited

Loading

gusfcarvalho commented Jun 11, 2024 •

edited

Loading

carlosrodfern commented Aug 12, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024 •

edited

Loading

carlosrodfern commented Aug 13, 2024 •

edited

Loading

Prateeknandle commented Aug 14, 2024 •

edited

Loading