-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18495 from newrelic/nb-k8s-troubleshooting-NR-266842
feat(K8s): Updating the K8s troubleshooting section
- Loading branch information
Showing
49 changed files
with
472 additions
and
780 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...tes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/errors.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 0 additions & 32 deletions
32
.../kubernetes-integration/advanced-configuration/k8s-version2/troubleshooting.mdx
This file was deleted.
Oops, something went wrong.
36 changes: 36 additions & 0 deletions
36
...tegration/advanced-configuration/k8s-version2/troubleshooting/missing-nodes.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
--- | ||
title: "Missing nodes for version 2" | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration v2 | ||
- Troubleshooting | ||
redirects: | ||
- /docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/troubleshooting | ||
metaDescription: Some troubleshooting tips if you're not seeing data show up for your New Relic's Kubernetes integration. | ||
freshnessValidatedDate: never | ||
--- | ||
|
||
## Problem | ||
|
||
You [deployed the infrastructure agent](/docs/infrastructure/infrastructure-monitoring/get-started/choose-infra-install-method/) and completed the [Kubernetes installation procedure](/install/kubernetes/) but not all nodes show up. | ||
|
||
## Solution | ||
|
||
Follow these steps: | ||
|
||
1. Confirm that you can schedule the infrastructure agent on each node by running this command: | ||
|
||
```shell | ||
kubectl describe daemonset newrelic-infra | ||
``` | ||
|
||
2. Confirm that the time on all nodes is accurate. Nodes that are more than 2 minutes ahead or behind will not show up in the Cluster explorer. The following NRQL query can be used to check if this is the case: | ||
|
||
```sql | ||
FROM K8sNodeSample | ||
SELECT latest(nr.ingestTimeMs - timestamp) / 1000 AS 'Clock offset seconds' | ||
FACET nodeName LIMIT max SINCE 1 DAY AGO | ||
``` | ||
|
||
3. [Retrieve the logs from the infrastructure agent](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/overview/#logs-version2) on the nodes that do not appear in the cluster explorer and confirm there are no [error messages](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/k8s-version2/errors/). |
2 changes: 1 addition & 1 deletion
2
...pixie/kubernetes-integration/advanced-configuration/k8s-version2/upgrade-v2.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
title: Upgrade from v2 | ||
title: Upgrade from version 2 | ||
tags: | ||
- Integrations | ||
- Kubernetes integration v2 | ||
|
31 changes: 31 additions & 0 deletions
31
...kubernetes-integration/troubleshooting/common-error-messages/error-messages.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
title: Error messages | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: 'Some of the more common error messages found in the infrastructure agent logs for New Relic Kubernetes integration.' | ||
redirects: | ||
- /docs/integrations/kubernetes-integration/troubleshooting/kubernetes-integration-troubleshooting-error-messages | ||
- /docs/integrations/host-integrations/troubleshooting/kubernetes-integration-troubleshooting-error-messages | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
It's possible that you may see error messages from your terminal during the installation of the Kubernetes integration, or when you check your New Relic infrastructure logs after the integration is installed. | ||
|
||
These are the possible error messages you can see: | ||
|
||
* [Error sending events](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/error-sending-events) | ||
* [Failed to discover kube-state-metrics](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/failed-discover-kube) | ||
* [Invalid New Relic license](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/invalid-nr-license) | ||
* [Installation error due to Dockerhub and registry.k8s.io](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/installation-error-dockerhub-registry) | ||
* [Pod is not starting](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/pod-not-starting) | ||
* [Repo newrelic not found](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/repo-newrelic-not-found) | ||
* [Unable to connect to the server](/docs/kubernetes-pixie/kubernetes-integration/troubleshooting/common-error-messages/unable-connect-server) | ||
|
||
|
||
|
||
|
||
|
||
|
26 changes: 26 additions & 0 deletions
26
...etes-integration/troubleshooting/common-error-messages/error-sending-events.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: 'Error sending events' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if you receive an error when sending events. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
The agent can't connect to the New Relic servers and you see an error like the following in the logs of the `agent` or `forwarder` containers: | ||
|
||
```shell | ||
2018-04-09T18:16:35.497195185Z time="2018-04-09T18:16:35Z" level=error | ||
msg="metric sender can't process 1 times" error="Error sending events: | ||
Post https://api.newrelic.com/metrics/events/bulk: | ||
net/http: request canceled (Client.Timeout exceeded while awaiting headers)" | ||
``` | ||
|
||
## Solution | ||
|
||
Depending on the exact nature of the error the message in the logs may differ. To address this problem, see the [New Relic networks documentation](/docs/new-relic-solutions/get-started/networks/#infrastructure) and the [Troubleshooting New Relic infrastructure agent networking issue](https://github.com/newrelic/infrastructure-agent/blob/master/docs/network_troubleshooting.md?) GitHub page. | ||
|
33 changes: 33 additions & 0 deletions
33
...etes-integration/troubleshooting/common-error-messages/failed-discover-kube.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
title: 'Failed to discover kube-state-metrics' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if kube-state-metrics is not found. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
The Kubernetes integration requires `kube-state-metrics`. If this is missing, you'll see an error like the following in the `nrk8s-ksm` container logs: | ||
|
||
```shell | ||
time="2022-06-21T09:12:20Z" level=error msg="retrieving scraper data: retrieving ksm data: discovering KSM endpoints: timeout discovering endpoints" | ||
``` | ||
|
||
## Solution | ||
|
||
Check the following: | ||
|
||
* `kube-state-metrics` has not been deployed into the cluster. | ||
* `kube-state-metrics` is deployed using a custom deployment. | ||
* There are multiple versions of `kube-state-metrics` running and the Kubernetes integration is not finding the correct one. | ||
|
||
The Kubernetes integration automatically detects `kube-state-metrics` in your cluster, using by default the label `app.kubernetes.io/name=kube-state-metrics` across all namespaces. | ||
|
||
|
||
<Callout variant="tip"> | ||
You can change the discovery behavior in the `ksm.config` of the [Helm chart](https://github.com/newrelic/nri-kubernetes/blob/main/charts/newrelic-infrastructure/values.yaml) values. | ||
</Callout> |
20 changes: 20 additions & 0 deletions
20
...troubleshooting/common-error-messages/installation-error-dockerhub-registry.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
title: 'Installation error due to Dockerhub and registry.k8s.io' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if you have an installation error due to Dockerhub and registry.k8s.io. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
You have a problem with the [New Relic dockerhub](https://hub.docker.com/u/newrelic) and Google's [`registry.k8s.io`](https://github.com/kubernetes/registry.k8s.io) during the installation. | ||
|
||
|
||
## Solution | ||
|
||
Check you've added their domains to your allow list. The installation pulls the container images from this location. You can [test connectivity to `registry.k8s.io`](https://kubernetes.io/blog/2023/03/10/image-registry-redirect/#how-can-i-check-if-i-am-impacted) to find the extra Google registry domains to add to your whitelist. `registry.k8s.io` usually redirects to your local registry domain. For example, `asia-northeast1-docker.pkg.dev` based on your region. | ||
|
24 changes: 24 additions & 0 deletions
24
...rnetes-integration/troubleshooting/common-error-messages/invalid-nr-license.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
title: 'Invalid New Relic license' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if the New Relic license is invalid. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
You are getting this error in the logs of the `agent` or `forwarder` containers: | ||
|
||
```shell | ||
2018-04-09T14:20:17.750893186Z time="2018-04-09T14:20:17Z" level=error | ||
msg="metric sender can't process 0 times" error="InventoryIngest: events | ||
were not accepted: 401 401 Unauthorized Invalid license key." | ||
``` | ||
|
||
## Solution | ||
|
||
Make sure you're using a valid <InlinePopover type="licenseKey"/>. |
36 changes: 36 additions & 0 deletions
36
...bernetes-integration/troubleshooting/common-error-messages/pod-not-starting.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
--- | ||
title: 'Pod is not starting' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if the Pod is not starting. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
You get the output error `nrk8s-kubelet pod is not starting` when you follow the guided installation. | ||
|
||
## Solution | ||
|
||
This error indicates that the Kubernetes kubelet pod can't be started within 5 minutes, and the installation script fails due to this timeout. | ||
|
||
In this case, you can run this command to see the pod's status and restarts: | ||
|
||
```bash | ||
kubectl get pods -o wide -n newrelic | grep nrk8s-kubelet | ||
``` | ||
|
||
Check the following: | ||
|
||
* If the pod is in `ImagePullBackOff` status, please check your network connection to allow image pulling from the [right domains](/docs/new-relic-solutions/get-started/networks). | ||
|
||
|
||
* If the pod is in `Pending` or `ContainerCreating` status, please run these commands to find out the possible reasons from the [debug logs](/docs/kubernetes-pixie/kubernetes-integration/advanced-configuration/get-logs-version/#verbose-logging): | ||
|
||
```bash | ||
kubectl logs newrelic-bundle-nrk8s-kubelet-n newrelic | ||
kubectl logs newrelic-bundle-nrk8s-kubelet-n newrelic -c kubelet | ||
``` |
26 changes: 26 additions & 0 deletions
26
...s-integration/troubleshooting/common-error-messages/repo-newrelic-not-found.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: 'Repo newrelic not found' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if the newrelic repo is not found. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
You see this error message during your [Kubernetes integration installation](/install/kubernetes/) with Helm or Manifest. | ||
|
||
```shell | ||
repo newrelic not found | ||
``` | ||
|
||
## Solution | ||
|
||
Add the newrelic repo to your helm chart by running this command: | ||
|
||
```shell | ||
helm repo add newrelic https://helm-charts.newrelic.com | ||
``` |
24 changes: 24 additions & 0 deletions
24
...tes-integration/troubleshooting/common-error-messages/unable-connect-server.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
title: 'Unable to connect to the server' | ||
type: troubleshooting | ||
tags: | ||
- Integrations | ||
- Kubernetes integration | ||
- Troubleshooting | ||
metaDescription: Some troubleshooting tips if you're having issues with the networking connection. | ||
freshnessValidatedDate: 2024-09-02 | ||
--- | ||
|
||
## Problem | ||
|
||
You get this output error when you're following the guided install. | ||
|
||
```shell | ||
Unable to connect to the server: dial tcp [7777:777:7777:7777:77::77]:443: i/o timeout | ||
``` | ||
|
||
## Solution | ||
|
||
This indicates that you're experiencing a network connection issue between the Kubernetes client and the Kubernetes API server. Make sure your Kubernetes client can connect to your Kubernetes API server before running the guided install again. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.