Kubernetes Requests vs Limits: The Mental Model Most Teams Get Wrong -

If your Kubernetes cluster has ever shown:

Pods stuck in Pending
Containers getting OOMKilled
APIs slowing down without errors

There is a very high chance your resource requests and limits are wrong.

This post explains how requests and limits actually work in production, not how they are usually explained.

Requests vs Limits Are NOT a Min/Max Pair

Most engineers think:

Request = minimum
Limit = maximum

This mental model is dangerously incomplete.

In reality:

Requests are used only by the scheduler
Limits are enforced at runtime by the Linux kernel

They operate at different layers of the system.

1. What Requests Really Do (Scheduling Only)

A request tells Kubernetes:

“Do not place this pod unless this much CPU/memory is available.”

If no node can satisfy the request, the pod never starts.

kubectl describe pod <pod-name>
kubectl get events --sort-by=.lastTimestamp

If you see:

Insufficient cpu
Insufficient memory

The pod is blocked before runtime.

✅ Requests affect placement, not performance.

2. What Limits Really Do (Runtime Enforcement)

A limit tells the kernel:

“This container must not exceed this amount.”

What happens next depends on the resource.

CPU Limits → Throttling (Silent Performance Kill)

CPU is compressible.

If a container exceeds its CPU limit:

It is throttled
It keeps running
Latency increases
No crash
No error logs

This is why CPU‑limited APIs can feel “slow” without obvious failures.

Memory Limits → OOMKill (Hard Kill)

Memory is not compressible.

If a container exceeds its memory limit:

Kernel kills the process
Pod restarts
Status shows OOMKilled
Exit code is usually 137

kubectl describe pod <pod-name>
kubectl top pod <pod-name>

If you see Reason: OOMKilled, your memory limit is below real usage.

3. Why Pods Stay Pending After “Small Changes”

This happens often in production:

Someone increases memory request “to be safe”
Requests now exceed node allocatable memory
Autoscaler cannot add larger nodes
Pod stays Pending forever

kubectl describe pod <pod-name>
kubectl describe nodes

Kubernetes is not broken — it is protecting the cluster.

4. Requests ≠ Reserved Resources

A critical misunderstanding:

Requests are accounting values
They are not exclusive reservations
Nodes can still be under pressure even if requests fit

This explains why:

Pods are scheduled successfully
But runtime performance is unstable

5. Production‑Safe Resource Strategy

For latency‑sensitive services

Set realistic CPU requests
Avoid very tight CPU limits
Monitor throttling

For memory‑heavy workloads

Measure baseline usage
Set memory limits above working set
Avoid guessing

kubectl top pod
kubectl top nodes

Always measure before setting limits.

The One‑Line Rule That Saves Incidents

Requests decide WHERE a pod runs
Limits decide HOW it fails

Once you internalize this, Kubernetes behavior becomes predictable.

InfraDecode Takeaway

Most Kubernetes outages are not caused by bugs —
they are caused by incorrect mental models.

Fix the model, and the system makes sense.

— InfraDecode

Discover more from

Subscribe to get the latest posts sent to your email.

Kubernetes Requests vs Limits: The Mental Model Most Teams Get Wrong