Stopping Rogue Agents: Observability and Guardrails for Production AI

Your AI agent just sent 10,000 requests to a premium external API in an hour, costing hundreds of dollars, and it’s still running. You thought you had observability, but your traditional monitoring dashboards show green. This isn’t just a bug; it’s a new class of financial and operational risk that demands a fundamentally different approach to production.

The New Frontier of Failure: Understanding Rogue Agents

Traditional observability stacks are blind to the unique failure modes of AI agents. An agent can appear “healthy” by conventional metrics – low CPU, ample memory, no HTTP 500s – while silently burning through budget or making disastrous decisions. This is the realm of rogue agents, runaway scenarios, and ‘botsitting’.

A rogue agent is one operating outside its intended parameters, often due to misinterpretation or an emergent property of its prompting. A runaway scenario is a specific instance where a rogue agent enters an uncontrolled loop or repeatedly executes costly actions. Botsitting is the manual, often frantic, human intervention required to halt or correct such an agent.

Consider a customer support agent designed to manage refunds. It encounters a malformed request, misinterprets “refund” as “process payment,” and attempts 500 payment processor API calls in 30 minutes. Each failed attempt costs $1.00 and generates 100 tokens of LLM output for retries and error parsing. That’s $500 in direct API costs, 50,000 unneeded tokens, and a senior engineer manually killing processes for hours. Your standard APM reports zero errors because the external API returned 200s for invalid requests, and the LLM calls were successful.

Beyond Logs: The Observability Stack for Agentic Workflows

To effectively monitor and understand agent behavior, we need to move past basic application metrics. We must capture the agent’s internal “thought process” and granular resource utilization, not just its external interactions. The critical data points are tool calls, their parameters, LLM prompts and responses, token counts, and latency at each step.

Cost attribution is a major challenge. An agent’s total cost is a mosaic of LLM provider charges (OpenAI, Anthropic, Gemini) and external tool API costs. We need to map these expenses granularly, down to individual agent runs and even specific tool invocations. This level of detail enables accurate budget tracking and identifies cost-heavy decision paths.

OpenTelemetry provides the instrumentation patterns we need. In LangChain, a custom CallbackHandler can emit spans for each Thought, Action, and Observation step. This gives us a trace of the agent’s reasoning.

from langsmith import LangChainTracer # Or a custom OTel handler
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

# Setup OTel provider (simplified)
resource = Resource.create({"service.name": "agent-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)

class AgentOtelCallback(LangChainTracer): # Inherit or wrap for custom OTel
    def on_agent_action(self, action, **kwargs):
        with tracer.start_as_current_span(f"AgentAction: {action.tool}") as span:
            span.set_attribute("tool_input", action.tool_input)
            # Add more attributes as needed
        super().on_agent_action(action, **kwargs)

    def on_tool_end(self, output, **kwargs):
        # Capture tool output
        super().on_tool_end(output, **kwargs)

    def on_llm_end(self, response, **kwargs):
        with tracer.start_as_current_span("LLMCall") as span:
            # Assume response has token usage
            if hasattr(response, 'llm_output') and response.llm_output:
                token_usage = response.llm_output.get("token_usage")
                if token_usage:
                    span.set_attribute("prompt_tokens", token_usage.get("prompt_tokens"))
                    span.set_attribute("completion_tokens", token_usage.get("completion_tokens"))
            # Add prompt/response as events or attributes if not too large
        super().on_llm_end(response, **kwargs)

# Wrap LLM client calls to capture prompt, response, token usage, latency
def instrumented_llm_call(model_id, prompt_messages, client_func):
    with tracer.start_as_current_span(f"LLMCall:{model_id}") as span:
        start_time = time.time()
        response = client_func(model_id, prompt_messages)
        end_time = time.time()

        span.set_attribute("llm.model_id", model_id)
        span.set_attribute("llm.latency_ms", (end_time - start_time) * 1000)
        # Extract token usage from response object based on provider
        # Example for OpenAI:
        if hasattr(response, 'usage'):
            span.set_attribute("llm.prompt_tokens", response.usage.prompt_tokens)
            span.set_attribute("llm.completion_tokens", response.usage.completion_tokens)
        # Store prompt/response content as span events or link to external storage
        return response

This instrumentation provides a rich, structured dataset. It allows us to build dashboards that show costs per agent run, identify specific tool call sequences that lead to high spending, and visualize the agent’s decision-making flow.

Ironclad Guardrails: Proactive Control & Cost Governance

Reactive monitoring is insufficient; we need proactive guardrails to prevent agents from spiraling. The goal is to enforce budget constraints before issues escalate.

Human-in-the-loop (HITL) processes are a critical safety net. When an agent exceeds a predefined cost threshold or makes a suspicious number of tool calls, its execution should pause.

# Example: Agent state serialization for HITL
import json
import redis

# Store agent state
def pause_agent(agent_id, current_state, reason):
    redis_client.set(f"agent:{agent_id}:state", json.dumps(current_state))
    redis_client.set(f"agent:{agent_id}:status", "paused")
    # Send notification to Slack
    slack_client.send_message(f"Agent {agent_id} paused: {reason}. Review at /resume/{agent_id}")

# API endpoint to resume
@app.post("/resume/{agent_id}")
def resume_agent(agent_id, action: str): # 'approve' or 'deny'
    if action == 'approve':
        state = json.loads(redis_client.get(f"agent:{agent_id}:state"))
        # Rehydrate agent and continue execution
        redis_client.set(f"agent:{agent_id}:status", "running")
    else:
        # Log and terminate
        redis_client.set(f"agent:{agent_id}:status", "terminated")

This pattern serializes the agent’s current state, notifies a human, and awaits a decision via an API endpoint. It allows for inspection and intervention without losing context.

For immediate, programmatic control, serverless functions can act as kill switches. These functions trigger based on observability alerts (e.g., high token usage, excessive API calls) and take decisive action.

# Update feature flag in AWS AppConfig to disable an agent feature
aws appconfig start-deployment \
    --application-id "my-agent-app" \
    --environment-id "prod" \
    --configuration-profile-id "agent-feature-flags" \
    --configuration-version "new-version-with-feature-off" \
    --deployment-strategy-id "instant-rollback"

# Revoke an API key used by a specific agent instance
aws secretsmanager update-secret --secret-id "agent-api-key-123" --secret-string "REVOKED_KEY"

# Update Redis-backed rate limits for external tool access
redis-cli SET agent:tool:payment_processor:rate_limit 0 EX 600

These actions can instantly cut off an agent’s access to external resources or disable its functionality. They are a last line of defense against runaway costs and unwanted actions.

Debugging the Non-Deterministic: Strategies for Agent RCA

Debugging non-deterministic agentic systems is fundamentally different from traditional step-through debugging. The same input can yield different execution paths, making root cause analysis (RCA) challenging.

One powerful technique is programmatic trace comparison against ‘golden’ traces. A ‘golden’ trace represents a known-good execution for a specific input. When an agent misbehaves, we compare its actual trace against this baseline.

def compare_traces(actual_trace, golden_trace):
    diffs = []
    # Compare sequence of tool calls
    if len(actual_trace.tool_calls) != len(golden_trace.tool_calls):
        diffs.append("Tool call sequence length mismatch")
    else:
        for i, (actual_call, golden_call) in enumerate(zip(actual_trace.tool_calls, golden_trace.tool_calls)):
            if actual_call.tool_name != golden_call.tool_name:
                diffs.append(f"Tool name mismatch at step {i}: {actual_call.tool_name} vs {golden_call.tool_name}")
            # Deep compare parameters, token usage, thought process, final output
            if actual_call.params != golden_call.params:
                diffs.append(f"Parameter mismatch at step {i} for {actual_call.tool_name}")
    return diffs

# Example usage
# actual_trace = get_trace_from_observability_system("run_id_X")
# golden_trace = load_golden_trace("scenario_Y")
# issues = compare_traces(actual_trace, golden_trace)
# if issues: print("Trace deviations found:", issues)

Key attributes to compare include the sequence of tool calls, their parameters, the token usage at each LLM

Share Post on X LinkedIn