Top Real‑Time Kubernetes Issues & How to Fix Them

Simple explanations, real commands, and quick fixes for problems every Kubernetes engineer faces.


Introduction


Kubernetes is powerful, but when something breaks in production, the error messages are rarely helpful at first glance.

This post walks through the most common real‑world Kubernetes issues, explains why they happen, and shows the exact commands you can use to diagnose and fix them.

No theory. No fluff. Just practical debugging.


1. Pod stuck in CrashLoopBackOff


What is it?

The container starts, crashes, and Kubernetes keeps restarting it in a loop.

Why it happens

  • Application crashes on startup
  • Missing or incorrect environment variables
  • Invalid config files
  • Application bug

How to fix it

Check pod status

kubectl get pods -n <namespace>

Check container logs

kubectl logs <pod-name> -n <namespace>

Check logs from the previous crash

kubectl logs <pod-name> -n <namespace> --previous

Describe the pod and inspect events

kubectl describe pod <pod-name> -n <namespace>

💡 Tip:
Always scroll to the Events section in kubectl describe.
That’s where Kubernetes usually tells you why the pod failed.


2. Pod stuck in Pending state


What is it?

The pod is created but never starts running.

Why it happens

  • Not enough CPU or memory on nodes
  • Node selectors / taints don’t match
  • PersistentVolumeClaim is not bound

How to fix it

Describe the pod

kubectl describe pod <pod-name> -n <namespace>

Look for messages like:

  • Insufficient cpu
  • Insufficient memory
  • No nodes available

Check node resources


kubectl describe nodes

Check PVC status


kubectl get pvc -n <namespace>

⚠️ Common cause:
A pod will stay Pending forever if its PVC is not bound.


3. ImagePullBackOff / ErrImagePull


What is it?

Kubernetes cannot pull the container image.

Why it happens

  • Wrong image name or tag
  • Private registry without credentials
  • Registry network issues

How to fix it

Check image details


kubectl describe pod <pod-name> -n <namespace>

Create image pull secret (private registry)

kubectl create secret docker-registry my-registry-secret \
  --docker-server=your-registry.io \
  --docker-username=<username> \
  --docker-password=<password> \
  -n <namespace>

Reference secret in deployment

imagePullSecrets:
  - name: my-registry-secret

4. Pod killed with OOMKilled


What is it?

The container used more memory than allowed and was killed by the kernel.

Why it happens

  • Memory limits are too low
  • Application has a memory spike or leak

How to fix it

Confirm OOMKilled

kubectl describe pod <pod-name> -n <namespace>

Check current memory usage

kubectl top pod <pod-name> -n <namespace>

Update memory limits

kubectl edit deployment <deployment-name> -n <namespace>

Example:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

💡 Best practice:
Always set both requests and limits, and don’t make limits too tight.


5. Service not accessible / connection refused

What is it?

The application is running, but you cannot reach it via the Service.

Why it happens

  • Service selector does not match pod labels
  • Wrong port mapping
  • No endpoints created

How to fix it

Check Service

kubectl get svc -n <namespace>

Check endpoints

kubectl get endpoints <service-name> -n <namespace>

If endpoints are <none>, your selector is wrong.

Test from inside the cluster

kubectl run debug-pod --rm -it --restart=Never \
  --image=busybox \
  -n <namespace> -- wget -qO- http://<service-name>

6. Node shows NotReady

What is it?

The node cannot run or schedule workloads.

Why it happens

  • Kubelet stopped
  • Node out of disk or memory
  • Network issues

How to fix it

Check node status

kubectl get nodes

Describe the node

kubectl describe node <node-name>

Restart kubelet on the node

sudo systemctl restart kubelet

Safely drain the node (if needed)

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl uncordon <node-name>

Quick Reference Cheat Sheet

ProblemCommandCommon Fix
CrashLoopBackOffkubectl logs --previousFix app config
Pod Pendingkubectl describe podAdd resources
ImagePullBackOffkubectl describe podFix image / auth
OOMKilledkubectl top podIncrease memory
Service unreachablekubectl get endpointsFix selector
Node NotReadykubectl describe nodeFix node issue

Final takeaway

Almost every Kubernetes issue can be diagnosed using just three commands:

kubectl get
kubectl describe
kubectl logs

Master these, and you’ll solve 90% of real‑world Kubernetes problems without panic.


Discover more from

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top