Kubernetes Requests vs Limits: The Mental Model Most Teams Get Wrong

If your Kubernetes cluster has ever shown:

  • Pods stuck in Pending
  • Containers getting OOMKilled
  • APIs slowing down without errors

There is a very high chance your resource requests and limits are wrong.

This post explains how requests and limits actually work in production, not how they are usually explained.


Requests vs Limits Are NOT a Min/Max Pair

Most engineers think:

Request = minimum
Limit = maximum

This mental model is dangerously incomplete.

In reality:

  • Requests are used only by the scheduler
  • Limits are enforced at runtime by the Linux kernel

They operate at different layers of the system.


1. What Requests Really Do (Scheduling Only)

A request tells Kubernetes:

“Do not place this pod unless this much CPU/memory is available.”

If no node can satisfy the request, the pod never starts.

kubectl describe pod <pod-name>
kubectl get events --sort-by=.lastTimestamp

If you see:

  • Insufficient cpu
  • Insufficient memory

The pod is blocked before runtime.

✅ Requests affect placement, not performance.


2. What Limits Really Do (Runtime Enforcement)

A limit tells the kernel:

“This container must not exceed this amount.”

What happens next depends on the resource.


CPU Limits → Throttling (Silent Performance Kill)

CPU is compressible.

If a container exceeds its CPU limit:

  • It is throttled
  • It keeps running
  • Latency increases
  • No crash
  • No error logs

This is why CPU‑limited APIs can feel “slow” without obvious failures.


Memory Limits → OOMKill (Hard Kill)

Memory is not compressible.

If a container exceeds its memory limit:

  • Kernel kills the process
  • Pod restarts
  • Status shows OOMKilled
  • Exit code is usually 137
kubectl describe pod <pod-name>
kubectl top pod <pod-name>

If you see Reason: OOMKilled, your memory limit is below real usage.


3. Why Pods Stay Pending After “Small Changes”

This happens often in production:

  • Someone increases memory request “to be safe”
  • Requests now exceed node allocatable memory
  • Autoscaler cannot add larger nodes
  • Pod stays Pending forever
kubectl describe pod <pod-name>
kubectl describe nodes

Kubernetes is not broken — it is protecting the cluster.


4. Requests ≠ Reserved Resources

A critical misunderstanding:

  • Requests are accounting values
  • They are not exclusive reservations
  • Nodes can still be under pressure even if requests fit

This explains why:

  • Pods are scheduled successfully
  • But runtime performance is unstable

5. Production‑Safe Resource Strategy

For latency‑sensitive services

  • Set realistic CPU requests
  • Avoid very tight CPU limits
  • Monitor throttling

For memory‑heavy workloads

  • Measure baseline usage
  • Set memory limits above working set
  • Avoid guessing
kubectl top pod
kubectl top nodes

Always measure before setting limits.


The One‑Line Rule That Saves Incidents

  • Requests decide WHERE a pod runs
  • Limits decide HOW it fails

Once you internalize this, Kubernetes behavior becomes predictable.


InfraDecode Takeaway

Most Kubernetes outages are not caused by bugs —
they are caused by incorrect mental models.

Fix the model, and the system makes sense.

InfraDecode


Discover more from

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top