Top Real‑Time Kubernetes Issues & How to Fix Them -

Simple explanations, real commands, and quick fixes for problems every Kubernetes engineer faces.

Introduction

Kubernetes is powerful, but when something breaks in production, the error messages are rarely helpful at first glance.

This post walks through the most common real‑world Kubernetes issues, explains why they happen, and shows the exact commands you can use to diagnose and fix them.

No theory. No fluff. Just practical debugging.

1. Pod stuck in `CrashLoopBackOff`

What is it?

The container starts, crashes, and Kubernetes keeps restarting it in a loop.

Why it happens

Application crashes on startup
Missing or incorrect environment variables
Invalid config files
Application bug

How to fix it

Check pod status

kubectl get pods -n <namespace>

Check container logs

kubectl logs <pod-name> -n <namespace>

Check logs from the previous crash

kubectl logs <pod-name> -n <namespace> --previous

Describe the pod and inspect events

kubectl describe pod <pod-name> -n <namespace>

💡 Tip:
Always scroll to the Events section in kubectl describe.
That’s where Kubernetes usually tells you why the pod failed.

2. Pod stuck in `Pending` state

What is it?

The pod is created but never starts running.

Why it happens

Not enough CPU or memory on nodes
Node selectors / taints don’t match
PersistentVolumeClaim is not bound

How to fix it

Describe the pod

kubectl describe pod <pod-name> -n <namespace>

Look for messages like:

Insufficient cpu
Insufficient memory
No nodes available

Check node resources


kubectl describe nodes

Check PVC status


kubectl get pvc -n <namespace>

⚠️ Common cause:
A pod will stay Pending forever if its PVC is not bound.

3. `ImagePullBackOff` / `ErrImagePull`

What is it?

Kubernetes cannot pull the container image.

Why it happens

Wrong image name or tag
Private registry without credentials
Registry network issues

How to fix it

Check image details


kubectl describe pod <pod-name> -n <namespace>

Create image pull secret (private registry)

kubectl create secret docker-registry my-registry-secret \
  --docker-server=your-registry.io \
  --docker-username=<username> \
  --docker-password=<password> \
  -n <namespace>

Reference secret in deployment

imagePullSecrets:
  - name: my-registry-secret

4. Pod killed with `OOMKilled`

What is it?

The container used more memory than allowed and was killed by the kernel.

Why it happens

Memory limits are too low
Application has a memory spike or leak

How to fix it

Confirm OOMKilled

kubectl describe pod <pod-name> -n <namespace>

Check current memory usage

kubectl top pod <pod-name> -n <namespace>

Update memory limits

kubectl edit deployment <deployment-name> -n <namespace>

Example:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

💡 Best practice:
Always set both requests and limits, and don’t make limits too tight.

5. Service not accessible / connection refused

What is it?

The application is running, but you cannot reach it via the Service.

Why it happens

Service selector does not match pod labels
Wrong port mapping
No endpoints created

How to fix it

Check Service

kubectl get svc -n <namespace>

Check endpoints

kubectl get endpoints <service-name> -n <namespace>

If endpoints are <none>, your selector is wrong.

Test from inside the cluster

kubectl run debug-pod --rm -it --restart=Never \
  --image=busybox \
  -n <namespace> -- wget -qO- http://<service-name>

6. Node shows `NotReady`

What is it?

The node cannot run or schedule workloads.

Why it happens

Kubelet stopped
Node out of disk or memory
Network issues

How to fix it

Check node status

kubectl get nodes

Describe the node

kubectl describe node <node-name>

Restart kubelet on the node

sudo systemctl restart kubelet

Safely drain the node (if needed)

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl uncordon <node-name>

Quick Reference Cheat Sheet

Problem	Command	Common Fix
CrashLoopBackOff	`kubectl logs --previous`	Fix app config
Pod Pending	`kubectl describe pod`	Add resources
ImagePullBackOff	`kubectl describe pod`	Fix image / auth
OOMKilled	`kubectl top pod`	Increase memory
Service unreachable	`kubectl get endpoints`	Fix selector
Node NotReady	`kubectl describe node`	Fix node issue

Final takeaway

Almost every Kubernetes issue can be diagnosed using just three commands:

kubectl get
kubectl describe
kubectl logs

Master these, and you’ll solve 90% of real‑world Kubernetes problems without panic.

Discover more from

Subscribe to get the latest posts sent to your email.

Top Real‑Time Kubernetes Issues & How to Fix Them

Simple explanations, real commands, and quick fixes for problems every Kubernetes engineer faces.

Introduction

1. Pod stuck in `CrashLoopBackOff`

What is it?

Why it happens

How to fix it

2. Pod stuck in `Pending` state

What is it?

Why it happens

How to fix it

3. `ImagePullBackOff` / `ErrImagePull`

What is it?

Why it happens

How to fix it

4. Pod killed with `OOMKilled`

What is it?

Why it happens

How to fix it

5. Service not accessible / connection refused

What is it?

Why it happens

How to fix it

6. Node shows `NotReady`

What is it?

Why it happens

How to fix it

Final takeaway

Like this:

Discover more from

Leave a Comment Cancel Reply

Simple explanations, real commands, and quick fixes for problems every Kubernetes engineer faces.

Introduction

1. Pod stuck in CrashLoopBackOff

What is it?

Why it happens

How to fix it

2. Pod stuck in Pending state

What is it?

Why it happens

How to fix it

3. ImagePullBackOff / ErrImagePull

What is it?

Why it happens

How to fix it

4. Pod killed with OOMKilled

What is it?

Why it happens

How to fix it

5. Service not accessible / connection refused

What is it?

Why it happens

How to fix it

6. Node shows NotReady

What is it?

Why it happens

How to fix it

Final takeaway

Like this:

Discover more from

Leave a Comment Cancel Reply

1. Pod stuck in `CrashLoopBackOff`

2. Pod stuck in `Pending` state

3. `ImagePullBackOff` / `ErrImagePull`

4. Pod killed with `OOMKilled`

6. Node shows `NotReady`