Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubearmor fetches wrong pod name/namespace to apply whitelist policy on long running GKE clusters (BPF-LSM enforcement) #1780

Open
gusfcarvalho opened this issue Jun 10, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@gusfcarvalho
Copy link

gusfcarvalho commented Jun 10, 2024

Bug Report

General Information

Issue

This is a consistent issue when running kubearmor on any long-lived cluster. We have a set of policies in protected-namespace where we whitelist only a few pods based on labels:

Config

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: default-deny
  namespace: protected-namespace
spec:
  action: Block
  process:
    matchDirectories:
    - dir: /
      recursive: true
  selector:
    matchLabels: {}
---
apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: allow-app
  namespace: protected-namespace
spec:
  action: Allow
  file:
    matchDirectories:
    - dir: /
      recursive: true
  process:
    matchDirectories:
    ## list of matchDirectories here
    matchPaths:
    ## list of allowed paths here
  selector:
    matchLabels:
      kubearmor-whitelist-profile: app

Symptom

A pod from unprotected-namespace cannot run due to not being able to run correct-binary: permission-denied

Extra information

With this configuration, after a time with kubearmor running on cluster, kubearmor starts to deny applications in unprotected-namespaces as well, even though there are no KubearmorPolicy objects on these namespace:

From karmor logs, I could see:

{
"Timestamp":1718029258,
"UpdatedTime":"2024-06-10T14:20:58.078820Z",
"ClusterName":"default",
"HostName":"***-bz88",
"NamespaceName":"protected",
"Owner":{"Ref":"Pod","Name":"wrong-pod-name","Namespace":"protected"},
"PodName":"wrong-pod-name",
"Labels":"wrong-pod-labels",
"ContainerID":"wrong-id",
"ContainerName":"wrong-container",
"ContainerImage":"wrong-image",
"ProcessName":"/binary-from-unprotected-pod",
"PolicyName":"default-deny",
"Severity":"1",
"Type":"MatchedPolicy",
"Operation":"Process",
"Resource":"/binary-from-unprotected-pod",
"Data":"lsm=SECURITY_BPRM_CHECK",
"Enforcer":"BPFLSM",
"Action":"Block",
"Result":"Permission denied",
"Cwd":"/"
}

This is recurring on any clsuter we have kubearmor operator running. As a workaround, if we kubectl delete pods --all -n <kubearmor-namespace>, the system goes back to running as expected (until a few days/weeks later, the issue restarts).

Versions

Kubearmor: v1.3.4
Cluster: gke 1.27.13-gke.1000000

Expected behavior

I would expect kubearmor to consistently get the information of the correct pods' names and namespaces :).

@gusfcarvalho gusfcarvalho added the bug Something isn't working label Jun 10, 2024
@daemon1024
Copy link
Member

Hey @gusfcarvalho

I want to try reproducing this issue, can you inform me about the scale of the cluster like number of nodes and number of pods/node. So that we can replicate this scenario, because this is not reproducible in our normal test clusters.

If by chance possible, Can you redact sensitive information and share logs for kubearmor pods.

Thanks!

@gusfcarvalho
Copy link
Author

gusfcarvalho commented Jun 11, 2024

we see this issue with a cluster with 8/9 nodes; running about 60 pods across 10 namespaces. We also see it on bigger clusters as well.

The main pain point is that it works on a 'fresh' pod - it takes about a few days or so for the issue to kick in (so I would expect it to not be seen on any e2e tests)

@gusfcarvalho
Copy link
Author

gusfcarvalho commented Jun 11, 2024

If by chance possible, Can you redact sensitive information and share logs for kubearmor pods.

sure! this issue is so recurring that's actually easy to fetch them. Do you need logs from kubearmor-bpf-containerd only?

from what I can see, pod logs only contain several of:

2024-06-11 05:24:33.556758      INFO    Detected a Pod (deleted/<>)
2024-06-11 05:24:42.843611      INFO    Successfully deleted visibility map with key={PidNS:40265*** MntNS:40265***} from the kernel
2024-06-11 05:24:42.850623      INFO    Detected a container (<>)
2024-06-11 05:25:46.128971      INFO    Updating container rules for aeb***
2024-06-11 05:25:47.422789      INFO    Detected a Pod (modified/<>)

the only information that I can see is performing wrong is with karmor. My guess is somehow the keys are getting mismatched.

@gusfcarvalho
Copy link
Author

This issue is still persisting. Any updates? Feel free to dm me in kubernetes slack on @gusfcarvalho

@gusfcarvalho
Copy link
Author

Hello! 😄 any updates on this?

@carlosrodfern
Copy link
Contributor

carlosrodfern commented Aug 12, 2024

I'm having a similar issue. I have a policy with a selector matching specific labels within a specific namespace. Two days later, I began to see relay alert logs pointing to policy violations in a different namespace, and on containers with no labels matching the selector. I'm running kubearmor v1.4.0, and on 17 nodes at his moment. On AWS EKS.

@carlosrodfern
Copy link
Contributor

carlosrodfern commented Aug 13, 2024

I deleted all the pods in the kube armor namespace, and some hours later, it was already misapplying policies.

A very simple policy:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: generic-maint-tools-access
  namespace: myapplication
spec:
  action: Audit
  message: Restricted maintenance tool access attempt detected
  process:
    matchDirectories:
      - dir: /sbin/
        recursive: true
      - dir: /usr/sbin/
        recursive: true
  selector:
    matchLabels:
      security: generic
  severity: 1
  tags:
    - PCI_DSS
    - MITRE
    - MITRE_T1553_Subvert_Trust_Controls

After recreating the pods, all bpf-containerd pods log this:

Detected a Security Policy (added/myapplication/generic-maint-tools-access)

However, a few hours later, I get these logs:

{
...
  "NamespaceName": "anothernamespace",
  "Owner":
    { "Ref": "StatefulSet", "Name": "somests", "Namespace": "anothernamespace" },
  "PodName": "somepod-2",
  "Labels": "...",
  "ContainerID": "...",
  "ContainerName": "somecontainername",
  "ContainerImage": "...",
  "HostPPID": ...,
  "HostPID": ...,
  "PPID": ...,
  "PID": ...,
  "UID": ...,
  "ParentProcessName": "...",
  "ProcessName": "...",
  "PolicyName": "generic-maint-tools-access",
  "Severity": "1",
  "Tags": "PCI_DSS,MITRE,MITRE_T1553_Subvert_Trust_Controls",
  "ATags": ["PCI_DSS", "MITRE", "MITRE_T1553_Subvert_Trust_Controls"],
  "Message": "Restricted maintenance tool access attempt detected",
  "Type": "MatchedPolicy",
  "Source": "....",
  "Operation": "Process",
  "Resource": "...",
  "Data": "syscall=SYS_EXECVE",
  "Enforcer": "eBPF Monitor",
  "Action": "Audit",
  "Result": "Passed",
  "Cwd": "/",
  ...
}

I have no other policy named generic-maint-tools-access in any other namespace, or cluster policy named like that.

@carlosrodfern
Copy link
Contributor

carlosrodfern commented Aug 13, 2024

It looks like the bug is right here?

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)

...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response.
It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.

Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?

I may be missing something.

@carlosrodfern
Copy link
Contributor

carlosrodfern commented Aug 13, 2024

It looks like the bug is right here?

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)

...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.

Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?

I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.

I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

@carlosrodfern
Copy link
Contributor

It looks like the bug is right here?

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?
I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.

I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

I passed the build issues and created an image with this proposed fix, and deployed it internally. I'll report back if that solves my scenario.

carlosrodfern added a commit to carlosrodfern/KubeArmor that referenced this issue Aug 13, 2024
Regular policies are mistakenly applied at cluster level over time

Fixes: kubearmor#1780

Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>
carlosrodfern added a commit to carlosrodfern/KubeArmor that referenced this issue Aug 13, 2024
Regular policies are mistakenly modified and applied at cluster level over time

Fixes: kubearmor#1780

Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>
carlosrodfern added a commit to carlosrodfern/KubeArmor that referenced this issue Aug 14, 2024
Regular policies are mistakenly modified and applied at cluster level over time.

The `if` condition in `GetSecurityPolicies(..)` returns true if
`matchClusterSecurityPolicyRule(..)` evaluates to `true`. That function doesn't
check whether the passed policy is a cluster policy, and when the
`matchExpressions` is empty, it ends up adding one namespace (whatever comes
back in the k8s client response first that hasn't been added yet) to
NamespaceList of all existing policies (cluster or not), it then returns `true`
and the policy is added to the `GetSecurityPolicies(..)` response. Over time,
as `matchClusterSecurityPolicyRule(..)` is called, the list of `NamespaceList`
in each policy keeps increasing, by one ns at a time, which explains the fact
that it may take time to display this behavior.

The cluster policies are built already with the `NamespaceList` properly
initialized, so the function `matchClusterSecurityPolicyRule(..)` is just
removed, letting the `|| kl.ContainsElement(..)` already present in the `if`
condition do the cluster policy matching.

Fixes: kubearmor#1780

Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>
@Prateeknandle
Copy link
Collaborator

Prateeknandle commented Aug 14, 2024

It looks like the bug is right here?

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?
I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.

I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

Hey @carlosrodfern thnx for the detailed explanation here. You are correct there is bug in this check matchClusterSecurityPolicyRule bcz it is getting executed for ksp(KubeArmorSecurityPolicy) as well and it will indeed return true in the case you mentioned. But we cannot remove this check because there is no namespace watcher exists, which will update the NamespaceList when a new namespace is created after applying the cluster policy. In the case of NotIn operator it is important to update the NamespaceList if a new namespace is added/created.

Thus we will not recommend to remove the check, rather add a condition in matchClusterSecurityPolicyRule to check if the policy is of ksp type, if it is then we will return false early.

BTW on a side note I don't think this was the bug that was causing the problem for @gusfcarvalho, bcz he is using v1.3.4 which does not have these changes, these are being added later and are available in v1.4.0

@carlosrodfern
Copy link
Contributor

It looks like the bug is right here?

if kl.MatchIdentities(policy.Spec.Selector.Identities, identities) || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) || matchClusterSecurityPolicyRule(policy) {

It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..) response. It appears that over time, as matchClusterSecurityPolicyRule(..) is called, the list of NamespaceList in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..) check for the policy type, and also receive the namespaceName to match agaisnt the cluster policy, and fix the updating of NamespaceList?
I may be missing something.

Actually, for what I can understand from CreateSecurityPolicy(..) and how it is used, the cluster policies are built already with the NamespaceList properly initialized, so the function matchClusterSecurityPolicyRule(policy) could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName) already present do the matching.
I'm trying to test that change but I'm having issues with the build using the Dockerfile: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory... Hoping the maintainers can chime in soon.

Hey @carlosrodfern thnx for the detailed explanation here. You are correct there is bug in this check matchClusterSecurityPolicyRule bcz it is getting executed for ksp(KubeArmorSecurityPolicy) as well and it will indeed return true in the case you mentioned. But we cannot remove this check because there is no namespace watcher exists, which will update the NamespaceList when a new namespace is created after applying the cluster policy. In the case of NotIn operator it is important to update the NamespaceList if a new namespace is added/created.

Thus we will not recommend to remove the check, rather add a condition in matchClusterSecurityPolicyRule to check if the policy is of ksp type, if it is then we will return false early.

BTW on a side note I don't think this was the bug that was causing the problem for @gusfcarvalho, bcz he is using v1.3.4 which does not have these changes, these are being added later and are available in v1.4.0

Thank you @Prateeknandle for looking into this. I'll be creating a separate issue and correcting the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Barun
Development

No branches or pull requests

4 participants