If your Kubernetes cluster has ever shown:
- Pods stuck in
Pending - Containers getting
OOMKilled - APIs slowing down without errors
There is a very high chance your resource requests and limits are wrong.
This post explains how requests and limits actually work in production, not how they are usually explained.
Requests vs Limits Are NOT a Min/Max Pair
Most engineers think:
Request = minimum
Limit = maximum
This mental model is dangerously incomplete.
In reality:
- Requests are used only by the scheduler
- Limits are enforced at runtime by the Linux kernel
They operate at different layers of the system.
1. What Requests Really Do (Scheduling Only)
A request tells Kubernetes:
“Do not place this pod unless this much CPU/memory is available.”
If no node can satisfy the request, the pod never starts.
kubectl describe pod <pod-name>
kubectl get events --sort-by=.lastTimestamp
If you see:
Insufficient cpuInsufficient memory
The pod is blocked before runtime.
✅ Requests affect placement, not performance.
2. What Limits Really Do (Runtime Enforcement)
A limit tells the kernel:
“This container must not exceed this amount.”
What happens next depends on the resource.
CPU Limits → Throttling (Silent Performance Kill)
CPU is compressible.
If a container exceeds its CPU limit:
- It is throttled
- It keeps running
- Latency increases
- No crash
- No error logs
This is why CPU‑limited APIs can feel “slow” without obvious failures.
Memory Limits → OOMKill (Hard Kill)
Memory is not compressible.
If a container exceeds its memory limit:
- Kernel kills the process
- Pod restarts
- Status shows
OOMKilled - Exit code is usually
137
kubectl describe pod <pod-name>
kubectl top pod <pod-name>
If you see Reason: OOMKilled, your memory limit is below real usage.
3. Why Pods Stay Pending After “Small Changes”
This happens often in production:
- Someone increases memory request “to be safe”
- Requests now exceed node allocatable memory
- Autoscaler cannot add larger nodes
- Pod stays
Pendingforever
kubectl describe pod <pod-name>
kubectl describe nodes
Kubernetes is not broken — it is protecting the cluster.
4. Requests ≠ Reserved Resources
A critical misunderstanding:
- Requests are accounting values
- They are not exclusive reservations
- Nodes can still be under pressure even if requests fit
This explains why:
- Pods are scheduled successfully
- But runtime performance is unstable
5. Production‑Safe Resource Strategy
For latency‑sensitive services
- Set realistic CPU requests
- Avoid very tight CPU limits
- Monitor throttling
For memory‑heavy workloads
- Measure baseline usage
- Set memory limits above working set
- Avoid guessing
kubectl top pod
kubectl top nodes
Always measure before setting limits.
The One‑Line Rule That Saves Incidents
- Requests decide WHERE a pod runs
- Limits decide HOW it fails
Once you internalize this, Kubernetes behavior becomes predictable.
InfraDecode Takeaway
Most Kubernetes outages are not caused by bugs —
they are caused by incorrect mental models.
Fix the model, and the system makes sense.
— InfraDecode
Discover more from
Subscribe to get the latest posts sent to your email.
