Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Emqx gets OOMKilled immediately after deployment #1055

Open
Fluder-Paradyne opened this issue Jun 7, 2024 · 9 comments
Open

New Emqx gets OOMKilled immediately after deployment #1055

Fluder-Paradyne opened this issue Jun 7, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@Fluder-Paradyne
Copy link
Contributor

Fluder-Paradyne commented Jun 7, 2024

Describe the bug
A clear and concise description of what the bug is.
I installed the cert-manager, then operator, then applied getting started yaml from docs.
The pods immediately fails saying OOMKilled, no logs.

I have tried other example deployment files, same issue with this file

Am I making some mistake ?

I have tried giving it more resources like this, but same issue

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
spec:
  image: emqx:5
  coreTemplate:
    spec:
      resources:
        requests:
          cpu: 1
          memory: 512Mi
        limits:
          cpu: 1
          memory: 1024Mi

To Reproduce
Steps to reproduce the behavior:
Install operator using helm version : 2.2.23
And after wait to get conditions met from operator.
Then apply this

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
   name: emqx
spec:
   image: emqx:5
Screenshot 2024-06-08 at 00 14 31

Expected behavior
The pods should run and cluster start

Anything else we need to know?:
enqx operator logs

{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxPlugin","path":"/mutate-apps-emqx-io-v1beta4-emqxplugin"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v1beta4-emqxplugin"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxPlugin","path":"/validate-apps-emqx-io-v1beta4-emqxplugin"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v1beta4-emqxplugin"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/convert"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v1beta4, Kind=EmqxPlugin"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxBroker","path":"/mutate-apps-emqx-io-v1beta4-emqxbroker"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v1beta4-emqxbroker"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxBroker","path":"/validate-apps-emqx-io-v1beta4-emqxbroker"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v1beta4-emqxbroker"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v1beta4, Kind=EmqxBroker"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxEnterprise","path":"/mutate-apps-emqx-io-v1beta4-emqxenterprise"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-apps-emqx-io-v1beta4-emqxenterprise"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v1beta4, Kind=EmqxEnterprise","path":"/validate-apps-emqx-io-v1beta4-emqxenterprise"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v1beta4-emqxenterprise"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v1beta4, Kind=EmqxEnterprise"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called","GVK":"apps.emqx.io/v2beta1, Kind=Rebalance"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"apps.emqx.io/v2beta1, Kind=Rebalance","path":"/validate-apps-emqx-io-v2beta1-rebalance"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-apps-emqx-io-v2beta1-rebalance"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.builder","msg":"Conversion webhook enabled","GVK":"apps.emqx.io/v2beta1, Kind=Rebalance"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-06-10T07:08:46Z","msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Starting webhook server"}
I0610 07:08:46.566017       1 leaderelection.go:250] attempting to acquire leader lease emqx-operator-system/19fd6fcc.emqx.io...
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2024-06-10T07:08:46Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
I0610 07:09:18.210702       1 leaderelection.go:260] successfully acquired lease emqx-operator-system/19fd6fcc.emqx.io
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting EventSource","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin","source":"kind source: *v1beta4.EmqxPlugin"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting Controller","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting EventSource","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise","source":"kind source: *v1beta4.EmqxEnterprise"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting Controller","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting EventSource","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance","source":"kind source: *v2beta1.Rebalance"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting Controller","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting EventSource","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","source":"kind source: *v2beta1.EMQX"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting Controller","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting EventSource","controller":"emqxbroker","controllerGroup":"apps.emqx.io","controllerKind":"EmqxBroker","source":"kind source: *v1beta4.EmqxBroker"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting Controller","controller":"emqxbroker","controllerGroup":"apps.emqx.io","controllerKind":"EmqxBroker"}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting workers","controller":"emqxplugin","controllerGroup":"apps.emqx.io","controllerKind":"EmqxPlugin","worker count":1}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting workers","controller":"emqxbroker","controllerGroup":"apps.emqx.io","controllerKind":"EmqxBroker","worker count":1}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting workers","controller":"rebalance","controllerGroup":"apps.emqx.io","controllerKind":"Rebalance","worker count":1}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting workers","controller":"emqxenterprise","controllerGroup":"apps.emqx.io","controllerKind":"EmqxEnterprise","worker count":1}
{"level":"info","ts":"2024-06-10T07:09:18Z","msg":"Starting workers","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","worker count":1}
{"level":"info","ts":"2024-06-10T07:23:53Z","msg":"got different pod template for EMQX core nodes, will create new statefulSet","controller":"emqx","controllerGroup":"apps.emqx.io","controllerKind":"EMQX","EMQX":{"name":"emqx","namespace":"emqx"},"namespace":"emqx","name":"emqx","reconcileID":"944ac40c-c12b-4f43-8ca8-efaa01c98c60","statefulSet":{"name":"emqx-core-6bd465679f","namespace":"emqx"},"patch":"{should create new StatefulSet}"}
{"level":"info","ts":"2024-06-10T07:23:54Z","logger":"KubeAPIWarningLogger","msg":"unknown field \"spec.coreTemplate.metadata.creationTimestamp\""}

Environment details::

  • Kubernetes version: 1.29
  • Cloud-provider/provisioner: aws
  • emqx-operator version: 2.2.23
  • Install method: helm chart
@Fluder-Paradyne Fluder-Paradyne added the bug Something isn't working label Jun 7, 2024
@Rory-Z
Copy link
Member

Rory-Z commented Jun 11, 2024

So strange, could you please check the failed pod and share it , just running kubectl get pod $name -o json and kubectl logs -p $name
And please make sure your node have enough resource

@Fluder-Paradyne
Copy link
Contributor Author

Did it just now again

Screenshot 2024-06-11 at 09 31 09

Get pod output

kubectl get pod $name -o json
{
    "apiVersion": "v1",
    "items": [
        {
            "apiVersion": "v1",
            "kind": "Pod",
            "metadata": {
                "creationTimestamp": "2024-06-11T03:58:41Z",
                "generateName": "emqx-core-75ff7c75f9-",
                "labels": {
                    "apps.emqx.io/db-role": "core",
                    "apps.emqx.io/instance": "emqx",
                    "apps.emqx.io/managed-by": "emqx-operator",
                    "apps.emqx.io/pod-template-hash": "75ff7c75f9",
                    "apps.kubernetes.io/pod-index": "0",
                    "controller-revision-hash": "emqx-core-75ff7c75f9-df4bcf689",
                    "statefulset.kubernetes.io/pod-name": "emqx-core-75ff7c75f9-0"
                },
                "name": "emqx-core-75ff7c75f9-0",
                "namespace": "emqx",
                "ownerReferences": [
                    {
                        "apiVersion": "apps/v1",
                        "blockOwnerDeletion": true,
                        "controller": true,
                        "kind": "StatefulSet",
                        "name": "emqx-core-75ff7c75f9",
                        "uid": "19197aa0-ba43-4a3b-ab12-8c203107ff1f"
                    }
                ],
                "resourceVersion": "32867086",
                "uid": "63b85614-e6f1-4b46-8006-44bf1cf0dcb3"
            },
            "spec": {
                "containers": [
                    {
                        "env": [
                            {
                                "name": "EMQX_DASHBOARD__LISTENERS__HTTP__BIND",
                                "value": "18083"
                            },
                            {
                                "name": "POD_NAME",
                                "valueFrom": {
                                    "fieldRef": {
                                        "apiVersion": "v1",
                                        "fieldPath": "metadata.name"
                                    }
                                }
                            },
                            {
                                "name": "EMQX_CLUSTER__DISCOVERY_STRATEGY",
                                "value": "dns"
                            },
                            {
                                "name": "EMQX_CLUSTER__DNS__RECORD_TYPE",
                                "value": "srv"
                            },
                            {
                                "name": "EMQX_CLUSTER__DNS__NAME",
                                "value": "emqx-headless.emqx.svc.cluster.local"
                            },
                            {
                                "name": "EMQX_HOST",
                                "value": "$(POD_NAME).$(EMQX_CLUSTER__DNS__NAME)"
                            },
                            {
                                "name": "EMQX_NODE__DATA_DIR",
                                "value": "data"
                            },
                            {
                                "name": "EMQX_NODE__ROLE",
                                "value": "core"
                            },
                            {
                                "name": "EMQX_NODE__COOKIE",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "node_cookie",
                                        "name": "emqx-node-cookie"
                                    }
                                }
                            },
                            {
                                "name": "EMQX_API_KEY__BOOTSTRAP_FILE",
                                "value": "\"/opt/emqx/data/bootstrap_api_key\""
                            }
                        ],
                        "image": "emqx:5",
                        "imagePullPolicy": "IfNotPresent",
                        "livenessProbe": {
                            "failureThreshold": 3,
                            "httpGet": {
                                "path": "/status",
                                "port": "dashboard",
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 60,
                            "periodSeconds": 30,
                            "successThreshold": 1,
                            "timeoutSeconds": 1
                        },
                        "name": "emqx",
                        "ports": [
                            {
                                "containerPort": 18083,
                                "name": "dashboard",
                                "protocol": "TCP"
                            }
                        ],
                        "readinessProbe": {
                            "failureThreshold": 12,
                            "httpGet": {
                                "path": "/status",
                                "port": "dashboard",
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 10,
                            "periodSeconds": 5,
                            "successThreshold": 1,
                            "timeoutSeconds": 1
                        },
                        "resources": {
                            "limits": {
                                "cpu": "1",
                                "memory": "1Gi"
                            },
                            "requests": {
                                "cpu": "1",
                                "memory": "512Mi"
                            }
                        },
                        "securityContext": {
                            "runAsGroup": 1000,
                            "runAsNonRoot": true,
                            "runAsUser": 1000
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "terminationMessagePolicy": "File",
                        "volumeMounts": [
                            {
                                "mountPath": "/opt/emqx/data/bootstrap_api_key",
                                "name": "bootstrap-api-key",
                                "readOnly": true,
                                "subPath": "bootstrap_api_key"
                            },
                            {
                                "mountPath": "/opt/emqx/etc/emqx.conf",
                                "name": "bootstrap-config",
                                "readOnly": true,
                                "subPath": "emqx.conf"
                            },
                            {
                                "mountPath": "/opt/emqx/log",
                                "name": "emqx-core-log"
                            },
                            {
                                "mountPath": "/opt/emqx/data",
                                "name": "emqx-core-data"
                            },
                            {
                                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
                                "name": "kube-api-access-5lp9j",
                                "readOnly": true
                            }
                        ]
                    }
                ],
                "dnsPolicy": "ClusterFirst",
                "enableServiceLinks": true,
                "hostname": "emqx-core-75ff7c75f9-0",
                "nodeName": "ip-10-16-253-98.ec2.internal",
                "preemptionPolicy": "PreemptLowerPriority",
                "priority": 0,
                "readinessGates": [
                    {
                        "conditionType": "apps.emqx.io/on-serving"
                    }
                ],
                "restartPolicy": "Always",
                "schedulerName": "default-scheduler",
                "securityContext": {
                    "fsGroup": 1000,
                    "fsGroupChangePolicy": "Always",
                    "runAsGroup": 1000,
                    "runAsUser": 1000,
                    "supplementalGroups": [
                        1000
                    ]
                },
                "serviceAccount": "default",
                "serviceAccountName": "default",
                "subdomain": "emqx-headless",
                "terminationGracePeriodSeconds": 30,
                "tolerations": [
                    {
                        "effect": "NoExecute",
                        "key": "node.kubernetes.io/not-ready",
                        "operator": "Exists",
                        "tolerationSeconds": 300
                    },
                    {
                        "effect": "NoExecute",
                        "key": "node.kubernetes.io/unreachable",
                        "operator": "Exists",
                        "tolerationSeconds": 300
                    }
                ],
                "volumes": [
                    {
                        "name": "emqx-core-data",
                        "persistentVolumeClaim": {
                            "claimName": "emqx-core-data-emqx-core-75ff7c75f9-0"
                        }
                    },
                    {
                        "name": "bootstrap-api-key",
                        "secret": {
                            "defaultMode": 420,
                            "secretName": "emqx-bootstrap-api-key"
                        }
                    },
                    {
                        "configMap": {
                            "defaultMode": 420,
                            "name": "emqx-configs"
                        },
                        "name": "bootstrap-config"
                    },
                    {
                        "emptyDir": {},
                        "name": "emqx-core-log"
                    },
                    {
                        "name": "kube-api-access-5lp9j",
                        "projected": {
                            "defaultMode": 420,
                            "sources": [
                                {
                                    "serviceAccountToken": {
                                        "expirationSeconds": 3607,
                                        "path": "token"
                                    }
                                },
                                {
                                    "configMap": {
                                        "items": [
                                            {
                                                "key": "ca.crt",
                                                "path": "ca.crt"
                                            }
                                        ],
                                        "name": "kube-root-ca.crt"
                                    }
                                },
                                {
                                    "downwardAPI": {
                                        "items": [
                                            {
                                                "fieldRef": {
                                                    "apiVersion": "v1",
                                                    "fieldPath": "metadata.namespace"
                                                },
                                                "path": "namespace"
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                ]
            },
            "status": {
                "conditions": [
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": null,
                        "status": "",
                        "type": "apps.emqx.io/on-serving"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:45Z",
                        "status": "True",
                        "type": "PodReadyToStartContainers"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "status": "True",
                        "type": "Initialized"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "message": "containers with unready status: [emqx]",
                        "reason": "ContainersNotReady",
                        "status": "False",
                        "type": "Ready"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "message": "containers with unready status: [emqx]",
                        "reason": "ContainersNotReady",
                        "status": "False",
                        "type": "ContainersReady"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "status": "True",
                        "type": "PodScheduled"
                    }
                ],
                "containerStatuses": [
                    {
                        "containerID": "containerd://48de18c9bb87de65ae106acc273547ae180526e876f3fef214a8b04e6ed219b3",
                        "image": "docker.io/library/emqx:5",
                        "imageID": "docker.io/library/emqx@sha256:4750557065e94e53e0ec96f9a804c1bb4bc3c6001a5ef517f9b873eea886e0ae",
                        "lastState": {
                            "terminated": {
                                "containerID": "containerd://48de18c9bb87de65ae106acc273547ae180526e876f3fef214a8b04e6ed219b3",
                                "exitCode": 137,
                                "finishedAt": "2024-06-11T03:59:02Z",
                                "reason": "OOMKilled",
                                "startedAt": "2024-06-11T03:59:01Z"
                            }
                        },
                        "name": "emqx",
                        "ready": false,
                        "restartCount": 2,
                        "started": false,
                        "state": {
                            "waiting": {
                                "message": "back-off 20s restarting failed container=emqx pod=emqx-core-75ff7c75f9-0_emqx(63b85614-e6f1-4b46-8006-44bf1cf0dcb3)",
                                "reason": "CrashLoopBackOff"
                            }
                        }
                    }
                ],
                "hostIP": "10.16.253.98",
                "hostIPs": [
                    {
                        "ip": "10.16.253.98"
                    }
                ],
                "phase": "Running",
                "podIP": "10.16.253.84",
                "podIPs": [
                    {
                        "ip": "10.16.253.84"
                    }
                ],
                "qosClass": "Burstable",
                "startTime": "2024-06-11T03:58:41Z"
            }
        },
        {
            "apiVersion": "v1",
            "kind": "Pod",
            "metadata": {
                "creationTimestamp": "2024-06-11T03:58:41Z",
                "generateName": "emqx-core-75ff7c75f9-",
                "labels": {
                    "apps.emqx.io/db-role": "core",
                    "apps.emqx.io/instance": "emqx",
                    "apps.emqx.io/managed-by": "emqx-operator",
                    "apps.emqx.io/pod-template-hash": "75ff7c75f9",
                    "apps.kubernetes.io/pod-index": "1",
                    "controller-revision-hash": "emqx-core-75ff7c75f9-df4bcf689",
                    "statefulset.kubernetes.io/pod-name": "emqx-core-75ff7c75f9-1"
                },
                "name": "emqx-core-75ff7c75f9-1",
                "namespace": "emqx",
                "ownerReferences": [
                    {
                        "apiVersion": "apps/v1",
                        "blockOwnerDeletion": true,
                        "controller": true,
                        "kind": "StatefulSet",
                        "name": "emqx-core-75ff7c75f9",
                        "uid": "19197aa0-ba43-4a3b-ab12-8c203107ff1f"
                    }
                ],
                "resourceVersion": "32867077",
                "uid": "7d5d46db-4937-49b8-87b7-0130fe4f614b"
            },
            "spec": {
                "containers": [
                    {
                        "env": [
                            {
                                "name": "EMQX_DASHBOARD__LISTENERS__HTTP__BIND",
                                "value": "18083"
                            },
                            {
                                "name": "POD_NAME",
                                "valueFrom": {
                                    "fieldRef": {
                                        "apiVersion": "v1",
                                        "fieldPath": "metadata.name"
                                    }
                                }
                            },
                            {
                                "name": "EMQX_CLUSTER__DISCOVERY_STRATEGY",
                                "value": "dns"
                            },
                            {
                                "name": "EMQX_CLUSTER__DNS__RECORD_TYPE",
                                "value": "srv"
                            },
                            {
                                "name": "EMQX_CLUSTER__DNS__NAME",
                                "value": "emqx-headless.emqx.svc.cluster.local"
                            },
                            {
                                "name": "EMQX_HOST",
                                "value": "$(POD_NAME).$(EMQX_CLUSTER__DNS__NAME)"
                            },
                            {
                                "name": "EMQX_NODE__DATA_DIR",
                                "value": "data"
                            },
                            {
                                "name": "EMQX_NODE__ROLE",
                                "value": "core"
                            },
                            {
                                "name": "EMQX_NODE__COOKIE",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "key": "node_cookie",
                                        "name": "emqx-node-cookie"
                                    }
                                }
                            },
                            {
                                "name": "EMQX_API_KEY__BOOTSTRAP_FILE",
                                "value": "\"/opt/emqx/data/bootstrap_api_key\""
                            }
                        ],
                        "image": "emqx:5",
                        "imagePullPolicy": "IfNotPresent",
                        "livenessProbe": {
                            "failureThreshold": 3,
                            "httpGet": {
                                "path": "/status",
                                "port": "dashboard",
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 60,
                            "periodSeconds": 30,
                            "successThreshold": 1,
                            "timeoutSeconds": 1
                        },
                        "name": "emqx",
                        "ports": [
                            {
                                "containerPort": 18083,
                                "name": "dashboard",
                                "protocol": "TCP"
                            }
                        ],
                        "readinessProbe": {
                            "failureThreshold": 12,
                            "httpGet": {
                                "path": "/status",
                                "port": "dashboard",
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 10,
                            "periodSeconds": 5,
                            "successThreshold": 1,
                            "timeoutSeconds": 1
                        },
                        "resources": {
                            "limits": {
                                "cpu": "1",
                                "memory": "1Gi"
                            },
                            "requests": {
                                "cpu": "1",
                                "memory": "512Mi"
                            }
                        },
                        "securityContext": {
                            "runAsGroup": 1000,
                            "runAsNonRoot": true,
                            "runAsUser": 1000
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "terminationMessagePolicy": "File",
                        "volumeMounts": [
                            {
                                "mountPath": "/opt/emqx/data/bootstrap_api_key",
                                "name": "bootstrap-api-key",
                                "readOnly": true,
                                "subPath": "bootstrap_api_key"
                            },
                            {
                                "mountPath": "/opt/emqx/etc/emqx.conf",
                                "name": "bootstrap-config",
                                "readOnly": true,
                                "subPath": "emqx.conf"
                            },
                            {
                                "mountPath": "/opt/emqx/log",
                                "name": "emqx-core-log"
                            },
                            {
                                "mountPath": "/opt/emqx/data",
                                "name": "emqx-core-data"
                            },
                            {
                                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
                                "name": "kube-api-access-j6xpw",
                                "readOnly": true
                            }
                        ]
                    }
                ],
                "dnsPolicy": "ClusterFirst",
                "enableServiceLinks": true,
                "hostname": "emqx-core-75ff7c75f9-1",
                "nodeName": "ip-10-16-253-98.ec2.internal",
                "preemptionPolicy": "PreemptLowerPriority",
                "priority": 0,
                "readinessGates": [
                    {
                        "conditionType": "apps.emqx.io/on-serving"
                    }
                ],
                "restartPolicy": "Always",
                "schedulerName": "default-scheduler",
                "securityContext": {
                    "fsGroup": 1000,
                    "fsGroupChangePolicy": "Always",
                    "runAsGroup": 1000,
                    "runAsUser": 1000,
                    "supplementalGroups": [
                        1000
                    ]
                },
                "serviceAccount": "default",
                "serviceAccountName": "default",
                "subdomain": "emqx-headless",
                "terminationGracePeriodSeconds": 30,
                "tolerations": [
                    {
                        "effect": "NoExecute",
                        "key": "node.kubernetes.io/not-ready",
                        "operator": "Exists",
                        "tolerationSeconds": 300
                    },
                    {
                        "effect": "NoExecute",
                        "key": "node.kubernetes.io/unreachable",
                        "operator": "Exists",
                        "tolerationSeconds": 300
                    }
                ],
                "volumes": [
                    {
                        "name": "emqx-core-data",
                        "persistentVolumeClaim": {
                            "claimName": "emqx-core-data-emqx-core-75ff7c75f9-1"
                        }
                    },
                    {
                        "name": "bootstrap-api-key",
                        "secret": {
                            "defaultMode": 420,
                            "secretName": "emqx-bootstrap-api-key"
                        }
                    },
                    {
                        "configMap": {
                            "defaultMode": 420,
                            "name": "emqx-configs"
                        },
                        "name": "bootstrap-config"
                    },
                    {
                        "emptyDir": {},
                        "name": "emqx-core-log"
                    },
                    {
                        "name": "kube-api-access-j6xpw",
                        "projected": {
                            "defaultMode": 420,
                            "sources": [
                                {
                                    "serviceAccountToken": {
                                        "expirationSeconds": 3607,
                                        "path": "token"
                                    }
                                },
                                {
                                    "configMap": {
                                        "items": [
                                            {
                                                "key": "ca.crt",
                                                "path": "ca.crt"
                                            }
                                        ],
                                        "name": "kube-root-ca.crt"
                                    }
                                },
                                {
                                    "downwardAPI": {
                                        "items": [
                                            {
                                                "fieldRef": {
                                                    "apiVersion": "v1",
                                                    "fieldPath": "metadata.namespace"
                                                },
                                                "path": "namespace"
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                ]
            },
            "status": {
                "conditions": [
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": null,
                        "status": "",
                        "type": "apps.emqx.io/on-serving"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:45Z",
                        "status": "True",
                        "type": "PodReadyToStartContainers"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "status": "True",
                        "type": "Initialized"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "message": "containers with unready status: [emqx]",
                        "reason": "ContainersNotReady",
                        "status": "False",
                        "type": "Ready"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "message": "containers with unready status: [emqx]",
                        "reason": "ContainersNotReady",
                        "status": "False",
                        "type": "ContainersReady"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2024-06-11T03:58:41Z",
                        "status": "True",
                        "type": "PodScheduled"
                    }
                ],
                "containerStatuses": [
                    {
                        "containerID": "containerd://082125f05b7dca038e6a1910f72366e5852609065f415a82861ee38fad656dea",
                        "image": "docker.io/library/emqx:5",
                        "imageID": "docker.io/library/emqx@sha256:4750557065e94e53e0ec96f9a804c1bb4bc3c6001a5ef517f9b873eea886e0ae",
                        "lastState": {
                            "terminated": {
                                "containerID": "containerd://082125f05b7dca038e6a1910f72366e5852609065f415a82861ee38fad656dea",
                                "exitCode": 137,
                                "finishedAt": "2024-06-11T03:59:00Z",
                                "reason": "OOMKilled",
                                "startedAt": "2024-06-11T03:59:00Z"
                            }
                        },
                        "name": "emqx",
                        "ready": false,
                        "restartCount": 2,
                        "started": false,
                        "state": {
                            "waiting": {
                                "message": "back-off 20s restarting failed container=emqx pod=emqx-core-75ff7c75f9-1_emqx(7d5d46db-4937-49b8-87b7-0130fe4f614b)",
                                "reason": "CrashLoopBackOff"
                            }
                        }
                    }
                ],
                "hostIP": "10.16.253.98",
                "hostIPs": [
                    {
                        "ip": "10.16.253.98"
                    }
                ],
                "phase": "Running",
                "podIP": "10.16.255.30",
                "podIPs": [
                    {
                        "ip": "10.16.255.30"
                    }
                ],
                "qosClass": "Burstable",
                "startTime": "2024-06-11T03:58:41Z"
            }
        }
    ],
    "kind": "List",
    "metadata": {
        "resourceVersion": ""
    }
}
Screenshot 2024-06-11 at 09 32 57

yea, if I have some logs it would have been great, that's why it is very strange.
I tried chaging log level to debug but no results.

Also changed increased the limit to 1024Mi.

I am able to run the docker image in my laptop and when I monitored it's resource consumption it didn't exceed 250mb.

@Fluder-Paradyne
Copy link
Contributor Author

Screenshot 2024-06-11 at 10 50 24 stats of the node where the pod is scheduled

@Rory-Z
Copy link
Member

Rory-Z commented Jun 11, 2024

I'm sorry, I have no idea.
Cloud you please cancel the resource limit of emqx, and retry, let's watch how many memory does it use, and maybe we can get some log of emqx. I think it will be helpful for us.

@Fluder-Paradyne
Copy link
Contributor Author

Hi, It worked when I removed limit, but the memory consumption is low

Screenshot 2024-06-11 at 12 56 32

to test this theory I put back the limit and it started crashing again

Screenshot 2024-06-11 at 12 58 02

This is the file I applied, hopefully this help you in figuring this out.

Normally when the limits is lower than requests it won't get applied, maybe the default is too low.

apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx
spec:
  image: emqx:5
  config:
    data: |
      log.console.level = debug
  coreTemplate:
    spec:
      resources:
        requests:
          cpu: 2
          memory: 1024Mi
        limits:
          cpu: 2
          memory: 1024Mi
      ## EMQX custom resources do not support updating this field at runtime
      volumeClaimTemplates:
        ## More content: https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
        ## Please manage the Amazon EBS CSI driver as an Amazon EKS add-on.
        ## For more documentation please refer to: https://docs.aws.amazon.com/zh_cn/eks/latest/userguide/managing-ebs-csi.html
        storageClassName: gp2
        resources:
          requests:
            storage: 10Gi
        accessModes:
          - ReadWriteOnce
  dashboardServiceTemplate:
    metadata:
      ## More content: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/annotations/
      annotations:
        ## Specifies whether the NLB is Internet-facing or internal. If not specified, defaults to internal.
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        ## Specify the availability zone to which the NLB will route traffic. Specify at least one subnet, either subnetID or subnetName (subnet name label) can be used.
        service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxx1,subnet-xxx2
    spec:
      type: LoadBalancer
      ## More content: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/nlb/
      loadBalancerClass: service.k8s.aws/nlb
  listenersServiceTemplate:
    metadata:
      ## More content: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/annotations/
      annotations:
        ## Specifies whether the NLB is Internet-facing or internal. If not specified, defaults to internal.
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        ## Specify the availability zone to which the NLB will route traffic. Specify at least one subnet, either subnetID or subnetName (subnet name label) can be used.
        service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxx1,subnet-xxx2
    spec:
      type: LoadBalancer
      ## More content: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/nlb/
      loadBalancerClass: service.k8s.aws/nlb

@Rory-Z
Copy link
Member

Rory-Z commented Jun 11, 2024

@zmstone @id Do we have any suggestion for emqx memory, could you please take a look

@id
Copy link

id commented Jun 12, 2024

Regular docker image starts just fine with these limits

docker run -d --name emqx --memory=512m --kernel-memory=1024m --cpus=1 emqx:5

Perhaps there is a memory spike when operator applies config via API? Can we get core dump to see what exactly caused it?

@Fluder-Paradyne
Copy link
Contributor Author

Fluder-Paradyne commented Jun 12, 2024

I am not sure how to take a core dump, if there are some commands that I can run, will replicate and send it.

@Rory-Z
Copy link
Member

Rory-Z commented Jun 12, 2024

Can we get core dump to see what exactly caused it?

Looks emqx didn't print log to console, is there any way to print core dump log to console? @id

Other, looks emqx didn't print any log, it's too strange

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants