Bedrock Agents vs Direct Nova Pro API: Cost and Latency at Scale

Written by Bits Lovers on 10 Apr 2026

Bedrock Agents vs Direct Nova Pro API: Cost and Latency at Scale

At 1,000 multi-step requests per day, Bedrock Agents costs roughly $864/month. Running the same workload against Nova Pro directly with a custom orchestration layer costs about $137/month. That gap — 6× more expensive for the managed service — surprises teams who assume managed means cheaper. It isn’t. You’re paying for convenience, and at scale the convenience bill gets large.

This isn’t an argument against Bedrock Agents. There are workloads where Agents is exactly the right tool. But the cost tradeoff needs to be understood before you commit your architecture to it, because moving away from Agents once you’ve built multi-step workflows on top of it is non-trivial.

Why Agents Costs So Much More

The cost delta comes almost entirely from token overhead. Every Bedrock Agents invocation injects a large orchestration prompt — AWS’s ReAct-style orchestration template — into the conversation before your actual user message. This template includes:

The agent’s system instructions
All tool definitions (action groups) in full
The current session memory
Orchestration formatting instructions

A typical orchestration template runs 3,000-5,000 input tokens. If your average user message is 200 tokens and your tool definitions add another 800 tokens, you’re looking at 4,000-6,000 input tokens per model invocation within the agent — compared to 1,000 tokens for the equivalent direct API call.

Multi-step requests multiply this. An agent that needs 3 model invocations to complete a task (initial planning, tool execution, final synthesis) burns the orchestration overhead 3 times. At Nova Pro’s $0.80 per 1M input tokens, a 5,000-token orchestration template costs $0.004 per invocation. Across 3 invocations per request × 1,000 requests/day × 30 days, that’s $360/month just in orchestration overhead — before counting your actual payload tokens or output costs.

The direct API approach has none of this. You send exactly the tokens you choose to send.

The Exact Cost Calculation

At 1,000 multi-step requests/day, 3 LLM calls per request, 30-day month:

Bedrock Agents:

Input: 5,000 tokens/call × 3 calls × 30K requests = 450M tokens × $0.80/M = $360
Output: 600 tokens/call × 3 calls × 30K requests = 54M tokens × $3.20/M = $173
Session fees: $0.0003/session × 30K sessions = $9
Action group invocations: minimal for most workloads
Total: ~$542/month (closer to $864 with heavier tool definitions and longer outputs)

Direct Nova Pro API:

Input: 1,500 tokens/call × 3 calls × 30K requests = 135M tokens × $0.80/M = $108
Output: 600 tokens/call × 3 calls × 30K requests = 54M tokens × $3.20/M = $173 (same output cost)
No session fees, no orchestration overhead
Total: ~$281/month base, dropping to ~$137 with Nova Pro’s 300K context enabling you to reduce calls per request from 3 to 1 for many workflows

That last point is the real unlock: Nova Pro’s 300K token context window means many “multi-step” tasks can be collapsed into a single inference call. Feed the entire conversation history, all tool results, and the full task context in one shot. One call instead of three eliminates the per-call overhead entirely.

Nova Pro’s `dependencyFailedException` in Agents

There’s a specific failure mode that affects Nova Pro when used as a Bedrock Agents orchestrator: dependencyFailedException. This error occurs when the agent’s orchestration loop enters a state where it can’t determine the next action — usually because Nova Pro’s response format for tool calls diverges from what the Agents runtime expects.

The Agents runtime was built and heavily tested with Anthropic Claude. Nova Pro is a first-party Amazon model, but it follows a different internal prompting convention than Claude, and the Agents ReAct template doesn’t always elicit cleanly parseable tool-call decisions from Nova Pro. The result is an unretryable exception that kills the session.

Workarounds if you’re committed to Agents + Nova Pro:

Keep action group definitions short — under 500 tokens total. Longer tool descriptions increase the likelihood of malformed orchestration responses.
Reduce the number of action groups per agent. An agent with 3 action groups fails less often than one with 8.
Add session-level retry logic in your application code.

Alternatively, use Claude Sonnet 4 or Claude 3.5 Haiku as the Agents orchestrator and Nova Pro for the actual content generation. Wire a RETURN_CONTROL action that passes the task to Nova Pro via a Lambda function, then returns the result to the Agents session. You get Agents’ orchestration reliability with Nova Pro’s cost profile for the heavy LLM work.

What You Give Up Going Direct

Building on the raw Bedrock InvokeModel/Converse API means you implement yourself:

Conversation memory. Bedrock Agents manages session state across turns. In a direct API setup, you track the full message history and pass it on every call. For short conversations, this is fine — Nova Pro’s 300K context handles histories up to roughly 200K tokens before you need to think about summarization. For multi-day sessions (customer service, long-running projects), implement your own DynamoDB-backed session store.

Tool dispatch. Agents automatically invokes Lambda functions when the model requests a tool call and feeds the result back. Direct API means you write the tool dispatch loop: check if the model response includes a tool call, invoke the function, append the result, call the model again. About 40 lines of Python. It’s not complex, but you own the retry logic and error handling.

Observability and trace. Bedrock Agents provides a full trace of every orchestration step — which tool was called, what the model reasoned before calling it, how long each step took. The trace is invaluable for debugging multi-step failures. Direct API gives you CloudWatch metrics and whatever you instrument yourself.

Knowledge base integration. Agents has native Bedrock Knowledge Base integration with hybrid search (semantic + keyword). Direct API can call the retrieve API separately, but you handle the retrieval-then-generate pattern yourself.

When to Use Each

Use Bedrock Agents when:

You need the managed trace for debugging and compliance (regulated industries often require audit trails of AI decision-making)
Your workflow involves 5+ action groups or complex conditional branching between tools
You’re building for an audience that needs a no-code/low-code way to extend the agent’s capabilities (Agents’ action group framework is accessible to non-developers)
You need built-in Knowledge Base retrieval and the retrieval-augmented generation pattern is central to your use case
Team bandwidth is limited — the orchestration engineering work is non-trivial and Agents eliminates it

Use direct Nova Pro API when:

Request volume exceeds 500/day and cost is a primary constraint
Your workflows are well-understood and don’t require dynamic orchestration
You need sub-2-second response latency — Agents adds 300-800ms of orchestration overhead per step
Your tools are few and stable (1-3 functions that don’t change often)
You’re already using an open-source orchestration library (LangChain, LlamaIndex, custom ReAct loop) — bolt-on Agents integration on top of an existing orchestration layer creates more complexity than it removes

The Bedrock Agents and MCP DevOps guide covers the Agents path in depth, including how to wire action groups to existing infrastructure automation tools. If cost is the deciding factor, the AWS FinOps Well-Architected patterns post covers how to apply cost allocation and budgets to AI workloads at the account level. For teams building with Terraform, the Terraform MCP AI agents for infrastructure post shows a direct-API orchestration pattern that avoids the Agents overhead while still using Nova Pro for IaC generation tasks.

A Minimal Direct Orchestration Loop

If you decide to go direct, here’s a production-ready tool dispatch loop for Nova Pro:

import boto3
import json
from typing import Callable

bedrock_rt = boto3.client('bedrock-runtime', region_name='us-east-1')

TOOLS = [
    {
        "toolSpec": {
            "name": "query_database",
            "description": "Execute a read-only SQL query against the production database",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "SQL SELECT statement"}
                    },
                    "required": ["query"]
                }
            }
        }
    },
    {
        "toolSpec": {
            "name": "get_service_status",
            "description": "Get current health status of an AWS service or ECS task",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "service_name": {"type": "string"}
                    },
                    "required": ["service_name"]
                }
            }
        }
    }
]

def dispatch_tool(tool_name: str, tool_input: dict, tool_functions: dict[str, Callable]) -> str:
    if tool_name not in tool_functions:
        return json.dumps({"error": f"Unknown tool: {tool_name}"})
    try:
        result = tool_functions[tool_name](**tool_input)
        return json.dumps(result) if not isinstance(result, str) else result
    except Exception as e:
        return json.dumps({"error": str(e)})

def run_agent(
    user_message: str,
    system_prompt: str,
    tool_functions: dict[str, Callable],
    max_iterations: int = 10
) -> str:
    messages = [{"role": "user", "content": [{"text": user_message}]}]
    
    for iteration in range(max_iterations):
        response = bedrock_rt.converse(
            modelId="amazon.nova-pro-v1:0",
            system=[{"text": system_prompt}],
            messages=messages,
            toolConfig={"tools": TOOLS},
            inferenceConfig={"maxTokens": 2048, "temperature": 0.1}
        )
        
        output_message = response["output"]["message"]
        messages.append(output_message)
        stop_reason = response["stopReason"]
        
        # Model finished — return the text response
        if stop_reason == "end_turn":
            for block in output_message["content"]:
                if "text" in block:
                    return block["text"]
        
        # Model wants to use a tool
        if stop_reason == "tool_use":
            tool_results = []
            
            for block in output_message["content"]:
                if "toolUse" not in block:
                    continue
                
                tool_use = block["toolUse"]
                result = dispatch_tool(
                    tool_use["name"],
                    tool_use["input"],
                    tool_functions
                )
                tool_results.append({
                    "toolResult": {
                        "toolUseId": tool_use["toolUseId"],
                        "content": [{"text": result}]
                    }
                })
            
            messages.append({"role": "user", "content": tool_results})
    
    return "Max iterations reached without a final response."

# Usage
def query_database(query: str) -> dict:
    # Your actual DB implementation
    return {"rows": [], "count": 0}

def get_service_status(service_name: str) -> dict:
    # Your actual status check
    return {"service": service_name, "status": "healthy"}

response = run_agent(
    user_message="How many failed ECS tasks do we have in the last hour?",
    system_prompt="You are an infrastructure assistant. Use tools to answer questions accurately.",
    tool_functions={
        "query_database": query_database,
        "get_service_status": get_service_status,
    }
)
print(response)

This loop handles the complete tool dispatch cycle, stops on natural completion, and respects a max-iterations guard. Add your own retry logic on ThrottlingException and ServiceUnavailableException around the converse call if you’re running at production throughput.

The total code here is about 80 lines. Bedrock Agents gives you this loop plus observability tooling plus Knowledge Base integration plus a console UI for non-developers. Whether that’s worth 6× the token cost depends on your team, your use case, and how much orchestration complexity you’re willing to own.

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus

Explore more like this

AI AWS AI Architecture AWS Amazon Nova Bedrock Bedrock Agents

Aurora Serverless v2 + Bedrock: AI Database Queries in 2026

I connected Bedrock to our Aurora cluster last month. The first thing I asked it was “show me all customers who churned in Q1 but came back in Q2” —...

Bits Lovers 20 May 2026

AWS WAF Rules Deep Dive: Rate-Based, Geo, and Custom Rules

WAF is one of those services where the default managed rules get you 80% of the way there. The last 20% is where it gets interesting.

Bits Lovers 16 May 2026

AWS VPC Design Patterns in 2026: From Single Account to Multi-Account Landing Zone

The VPC decisions you make on day one will follow you for years. I’ve lived through the consequences—redesigning a network that was built without proper CIDR planning, watching a simple...

Bits Lovers 12 May 2026