Simple explanations, real commands, and quick fixes for problems every Kubernetes engineer faces.
Introduction
Kubernetes is powerful, but when something breaks in production, the error messages are rarely helpful at first glance.
This post walks through the most common real‑world Kubernetes issues, explains why they happen, and shows the exact commands you can use to diagnose and fix them.
No theory. No fluff. Just practical debugging.
1. Pod stuck in CrashLoopBackOff
What is it?
The container starts, crashes, and Kubernetes keeps restarting it in a loop.
Why it happens
- Application crashes on startup
- Missing or incorrect environment variables
- Invalid config files
- Application bug
How to fix it
Check pod status
kubectl get pods -n <namespace>
Check container logs
kubectl logs <pod-name> -n <namespace>
Check logs from the previous crash
kubectl logs <pod-name> -n <namespace> --previous
Describe the pod and inspect events
kubectl describe pod <pod-name> -n <namespace>
💡 Tip:
Always scroll to the Events section in kubectl describe.
That’s where Kubernetes usually tells you why the pod failed.
2. Pod stuck in Pending state
What is it?
The pod is created but never starts running.
Why it happens
- Not enough CPU or memory on nodes
- Node selectors / taints don’t match
- PersistentVolumeClaim is not bound
How to fix it
Describe the pod
kubectl describe pod <pod-name> -n <namespace>
Look for messages like:
Insufficient cpuInsufficient memoryNo nodes available
Check node resources
kubectl describe nodes
Check PVC status
kubectl get pvc -n <namespace>
⚠️ Common cause:
A pod will stay Pending forever if its PVC is not bound.
3. ImagePullBackOff / ErrImagePull
What is it?
Kubernetes cannot pull the container image.
Why it happens
- Wrong image name or tag
- Private registry without credentials
- Registry network issues
How to fix it
Check image details
kubectl describe pod <pod-name> -n <namespace>
Create image pull secret (private registry)
kubectl create secret docker-registry my-registry-secret \
--docker-server=your-registry.io \
--docker-username=<username> \
--docker-password=<password> \
-n <namespace>
Reference secret in deployment
imagePullSecrets:
- name: my-registry-secret
4. Pod killed with OOMKilled
What is it?
The container used more memory than allowed and was killed by the kernel.
Why it happens
- Memory limits are too low
- Application has a memory spike or leak
How to fix it
Confirm OOMKilled
kubectl describe pod <pod-name> -n <namespace>
Check current memory usage
kubectl top pod <pod-name> -n <namespace>
Update memory limits
kubectl edit deployment <deployment-name> -n <namespace>
Example:
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"
💡 Best practice:
Always set both requests and limits, and don’t make limits too tight.
5. Service not accessible / connection refused
What is it?
The application is running, but you cannot reach it via the Service.
Why it happens
- Service selector does not match pod labels
- Wrong port mapping
- No endpoints created
How to fix it
Check Service
kubectl get svc -n <namespace>
Check endpoints
kubectl get endpoints <service-name> -n <namespace>
If endpoints are <none>, your selector is wrong.
Test from inside the cluster
kubectl run debug-pod --rm -it --restart=Never \
--image=busybox \
-n <namespace> -- wget -qO- http://<service-name>
6. Node shows NotReady
What is it?
The node cannot run or schedule workloads.
Why it happens
- Kubelet stopped
- Node out of disk or memory
- Network issues
How to fix it
Check node status
kubectl get nodes
Describe the node
kubectl describe node <node-name>
Restart kubelet on the node
sudo systemctl restart kubelet
Safely drain the node (if needed)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl uncordon <node-name>
Quick Reference Cheat Sheet
| Problem | Command | Common Fix |
|---|---|---|
| CrashLoopBackOff | kubectl logs --previous | Fix app config |
| Pod Pending | kubectl describe pod | Add resources |
| ImagePullBackOff | kubectl describe pod | Fix image / auth |
| OOMKilled | kubectl top pod | Increase memory |
| Service unreachable | kubectl get endpoints | Fix selector |
| Node NotReady | kubectl describe node | Fix node issue |
Final takeaway
Almost every Kubernetes issue can be diagnosed using just three commands:
kubectl get
kubectl describe
kubectl logs
Master these, and you’ll solve 90% of real‑world Kubernetes problems without panic.
Discover more from
Subscribe to get the latest posts sent to your email.
