forked from kubernetes/kubernetes
-
Notifications
You must be signed in to change notification settings - Fork 107
Debugging FAQ
lavalamp edited this page Dec 17, 2014
·
49 revisions
Tips that may help you debug why Kubernetes isn't working.
Of course, also take a look at the documentation, especially the getting-started guides.
When asking for help, please indicate your hosting platform (GCE, Vagrant, physical machines, etc.), OS distribution (Debian, CoreOS, Fedora, etc.), and special networking setup (Flannel, OVS, etc.).
-
Of your pod:
cluster/kubectl.sh logs <podname> [<containername>]
-
Of kubernetes system components:
- Depending on the Linux distribution, the logs of system components, including Docker, will be in /var/log or /tmp, or can be accessed using journalctl on systemd-based systems, such as Fedora, RHEL7, or CoreOS. Salt logs on minions are in /var/log/salt/minion.
- If you don't see much useful in the logs, you could try turning on verbose logging on the Kubernetes component you suspect has a problem using
--v
or--vmodule
, to at least level 4. See https://github.com/golang/glog for more details. - You can see what containers have been created on a node using
docker ps -a
. - You can see what's happening in Kubernetes with
cluster/kubecfg.sh list events
.
You can browse the contents of etcd with etcdctl.
- Ensure all backend components are running
- on master: apiserver, controller, scheduler, etcd
- IMPORTANT: Some older turnup instructions don't include the scheduler. Ensure the scheduler is running on the master host.
- on nodes: proxy, kubelet, docker
- on master: apiserver, controller, scheduler, etcd
- Ensure all k8s components have --etcd_servers set correctly on the command line (if it isn't, you should see error messages in their logs)
- If it's not set, your networking setup may be broken, since it is usually initialized from the IP address of kubernetes-master, such as in cluster/saltbase/salt/apiserver/default
- Pods stay Pending
- Use
cluster/kubectl.sh describe pod podname
; there should be events describing the last thing that happened to the pod. - Use
cluster/kubectl.sh get events
to see all cluster events. - If you see scheduler events but not kubelet events, look at kubelet logs.
- Do kubelet and apiserver agree about what the kubelet is called? (The 'Host' should match the node's hostname(), or the thing passed to kubelet's --hostname_override)
- Use
-
dev-build-and-up.sh
waits for ever atWaiting for cluster initialization
- Try
cluster/kube-down.sh
andhack/dev-build-and-up.sh
again- If it still hangs, ctrl-c and try
hack/dev-build-and-push.sh
- Check whether all the VMs exist -- typically one master VM and N minions
- If so, check whether you can ssh into them
- Check serial console output, if available
- If it still hangs, ctrl-c and try
- If it still doesn't work, see provider-specific issues below
- Try
-
dev-build-and-up.sh
reportsDocker failed to install on kubernetes-minion-1
- Verify that you can ssh into the minions
- Check /var/log/salt/minion to see what part of the installation failed
-
SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
in the minion's salt log- Try
python -c "import urllib2; req = urllib2.Request('https://get.docker.io/gpg'); response = urllib2.urlopen(req); print response.read()"
. If it fails with the above message, then Docker enabled SNI on docker.io and/or redirected to docker.com (which has SNI enabled).
- Try
- kubecfg cannot reach apiserver
- Ensure KUBERNETES_MASTER or KUBE_MASTER_IP is set, or use -h
- Ensure apiserver is running (esp. if you receive 502 Bad Gateway)
- Check that the process is running on the master
- Check its logs
- You were able to create a
replicationController
but see no pods- The replication controller didn't create the pods. Check that the controller is running, and look at its logs.
- kubecfg hangs forever or a pod is in state
Waiting
forever- Check whether hosts are being assigned to your pods. If not, then they aren't being scheduled. If they are, check Kubelet and Docker logs.
- Ensure kubelet is looking in the right place in etcd for its pods. If you see something like
DEBUG: get /registry/hosts/127.0.0.1/kubelet
in a kubelet's logs, then check whether the apiserver is using the same name or IP for that minion. If not, check the value of the --hostname_override command-line flag on kubelet. - It could also be that the image fetch is not working (e.g., because you mistyped the image name). Check Docker logs.
- apiserver reports
Error synchronizing container: Get http://:10250/podInfo?podID=foo: dial tcp :10250: connection refused
- Just means that pod foo has not yet been scheduled (see #1285)
- Check whether the scheduler is running properly
- If the scheduler is running, possibly no minion addresses were passed to the apiserver using
--machines
(seehack/local-cluster-up.sh
for an example)
- Cannot connect to the container
- Try to telnet to the minion at its service port, and/or to the pod's IP and port
- Check whether the container has been created in Docker:
sudo docker ps -a
- If you don't see the container, there could be a problem with the pod configuration, image, Docker, or Kubelet
- If you see containers created every 10 seconds, then container creation is failing or the container's process is failing
- Why does PUT return
{"kind":"Status","creationTimestamp":null,"apiVersion":"v1beta1","status":"failure","message":"replicationController \"fooController\" cannot be updated: 105: Key already exists (/registry/controllers/fooController) [25464]","reason":"conflict","details":{"id":"fooController","kind":"replicationController"},"code":409}
?- We use
resourceVersion
for optimistic concurrency. The value assigned by the system at the last mutation of the object needs to be provided when performing an update, in order to prevent accidentally clobbering another update. kubecfg achieves this by doing a GET of the object, extracting the resourceVersion, and inserting it into the json of the PUT, which defeats the purpose of the concurrency control, but works for single-user scenarios.
- We use
-
x509: certificate has expired or is not yet valid
- Check whether the current time matches on client and server. Use
ntpdate
for one-time clock synchronization.
- Check whether the current time matches on client and server. Use
make clean
or
rm -rf Godeps/_workspace/pkg output _output
- My minions can't talk to the master
- Check the output of
ip addr
,ip route show
,traceroute kubernetes-master
, andiptables --list
. Does the minion have the expected IP address and subnet? Is traffic going through the expected gateway? Is the master dropping the packets?
- Check the output of
If you use the "Host-only networking" feature with VirtualBox, make sure net-tools is installed.
For more details see https://wiki.archlinux.org/index.php/VirtualBox
TODO
- Ensure you can ssh to an instance, which may require enabling billing and/or creating an ssh key. Create an instance if you don't have one, then use
gcutil ssh
to ssh into it. -
gcutil listfirewalls ; gcutil getfirewall default-ssh
- If
default-ssh
doesn't exist, dogcutil addfirewall --description "SSH allowed from anywhere" --allowed=tcp:22 default-ssh
- If
gcutil listnetworks