🧠 Introduction
Most of us start using LLMs like this:
Ask question → get answer
But in real-world DevOps, that’s not enough.
When a pod fails, you don’t just need an answer — you need a system that:
- Checks logs
- Reads pod status
- Looks at events
- Decides what to do next
👉 That’s where AI agents come in.
In this post, we’ll build a dynamic tool-calling Kubernetes troubleshooting agent that:
- Decides which action to take
- Calls tools like
kubectl logs(simulated) - Uses results to find root cause
💡 What is Tool Calling?
By default, LLMs only generate text.
But with tool calling:
👉 Instead of answering directly, the model can say:
Call this tool → with these arguments
Example:
{
"tool": "get_logs",
"arguments": {
"pod": "myapp"
}
}
👉 Your code executes it → returns result → model reasons again.
🔧 Architecture We’re Building
User Input
↓
LLM (decision)
↓
Tool Call (JSON)
↓
Python executes tool
↓
Result back to LLM
↓
Final answer
⚙️ Step 1: Setup (Colab + Groq)
!pip install groq
from groq import Groq
client = Groq(api_key="YOUR_API_KEY")
MODEL = "llama-3.1-8b-instant"
🛠️ Step 2: Define Tools (Simulating Kubernetes)
These represent real DevOps actions.
def get_logs(pod, namespace="default", tail_lines=200):
return {
"pod": pod,
"logs": "ERROR: connection refused to DB\nERROR startup failed"
}
def describe_pod(pod, namespace="default"):
return {
"pod": pod,
"status": "CrashLoopBackOff",
"restartCount": 5
}
def get_events(namespace="default"):
return {
"events": ["Readiness probe failed", "Back-off restarting container"]
}
📐 Step 3: Tool Schema (How LLM Understands Tools)
TOOLS = [
{
"type": "function",
"function": {
"name": "get_logs",
"description": "Fetch logs for troubleshooting",
"parameters": {
"type": "object",
"properties": {
"pod": {"type": "string"}
},
"required": ["pod"]
}
}
},
{
"type": "function",
"function": {
"name": "describe_pod",
"description": "Describe pod status",
"parameters": {
"type": "object",
"properties": {
"pod": {"type": "string"}
},
"required": ["pod"]
}
}
},
{
"type": "function",
"function": {
"name": "get_events",
"description": "Fetch cluster events",
"parameters": {
"type": "object",
"properties": {
"namespace": {"type": "string"}
},
"required": ["namespace"]
}
}
}
]
🔁 Step 4: Dynamic Agent Loop (Core Logic)
This is where the “agent behavior” happens.
import json
TOOL_REGISTRY = {
"get_logs": get_logs,
"describe_pod": describe_pod,
"get_events": get_events
}
def dynamic_tool_agent(user_input):
messages = [
{"role": "system", "content": "You are a Kubernetes troubleshooting agent."},
{"role": "user", "content": user_input}
]
response = client.chat.completions.create(
model=MODEL,
messages=messages,
tools=TOOLS,
tool_choice="auto"
)
msg = response.choices[0].message
if msg.tool_calls:
tool_call = msg.tool_calls[0]
tool_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
result = TOOL_REGISTRY**args
messages.append({
"role": "tool",
"content": json.dumps(result)
})
final = client.chat.completions.create(
model=MODEL,
messages=messages
)
return final.choices[0].message.content
return msg.content
🧪 Step 5: Run the Agent
response = dynamic_tool_agent("Pod myapp is crashing repeatedly")
print(response)
🔥 What’s Happening Internally
When you run:
"Pod myapp is crashing"
👉 Agent does:
- Calls
describe_pod - Sees CrashLoopBackOff
- Calls
get_logs - Finds DB issue
- Returns fix
🚀 Why This is Powerful
✅ Before (Static logic)
IF crash → logs
IF issue → describe
✅ Now (Dynamic AI Agent)
- LLM decides ✅
- Tools executed dynamically ✅
- Multi-step reasoning ✅
🧠 Real DevOps Mapping
| Agent Tool | Real Command |
|---|---|
| get_logs() | kubectl logs |
| describe_pod() | kubectl describe pod |
| get_events() | kubectl get events |
⚠️ Key Learnings
✅ LLM does NOT execute tools
It only returns structured intent
✅ Tool results must go back to LLM
Otherwise, no reasoning loop
✅ JSON must be parsed safely
Never trust LLM outputs blindly
InfraDecode Takeaway
“Chatbots answer questions.
Agents solve problems.”
— InfraDecode
Discover more from
Subscribe to get the latest posts sent to your email.
