Bedrock Agents vs Direct Nova Pro API: Cost and Latency at Scale
At 1,000 multi-step requests per day, Bedrock Agents costs roughly $864/month. Running the same workload against Nova Pro directly with a custom orchestration layer costs about $137/month. That gap — 6× more expensive for the managed service — surprises teams who assume managed means cheaper. It isn’t. You’re paying for convenience, and at scale the convenience bill gets large.
This isn’t an argument against Bedrock Agents. There are workloads where Agents is exactly the right tool. But the cost tradeoff needs to be understood before you commit your architecture to it, because moving away from Agents once you’ve built multi-step workflows on top of it is non-trivial.
Why Agents Costs So Much More
The cost delta comes almost entirely from token overhead. Every Bedrock Agents invocation injects a large orchestration prompt — AWS’s ReAct-style orchestration template — into the conversation before your actual user message. This template includes:
- The agent’s system instructions
- All tool definitions (action groups) in full
- The current session memory
- Orchestration formatting instructions
A typical orchestration template runs 3,000-5,000 input tokens. If your average user message is 200 tokens and your tool definitions add another 800 tokens, you’re looking at 4,000-6,000 input tokens per model invocation within the agent — compared to 1,000 tokens for the equivalent direct API call.
Multi-step requests multiply this. An agent that needs 3 model invocations to complete a task (initial planning, tool execution, final synthesis) burns the orchestration overhead 3 times. At Nova Pro’s $0.80 per 1M input tokens, a 5,000-token orchestration template costs $0.004 per invocation. Across 3 invocations per request × 1,000 requests/day × 30 days, that’s $360/month just in orchestration overhead — before counting your actual payload tokens or output costs.
The direct API approach has none of this. You send exactly the tokens you choose to send.
The Exact Cost Calculation
At 1,000 multi-step requests/day, 3 LLM calls per request, 30-day month:
Bedrock Agents:
- Input: 5,000 tokens/call × 3 calls × 30K requests = 450M tokens × $0.80/M = $360
- Output: 600 tokens/call × 3 calls × 30K requests = 54M tokens × $3.20/M = $173
- Session fees: $0.0003/session × 30K sessions = $9
- Action group invocations: minimal for most workloads
- Total: ~$542/month (closer to $864 with heavier tool definitions and longer outputs)
Direct Nova Pro API:
- Input: 1,500 tokens/call × 3 calls × 30K requests = 135M tokens × $0.80/M = $108
- Output: 600 tokens/call × 3 calls × 30K requests = 54M tokens × $3.20/M = $173 (same output cost)
- No session fees, no orchestration overhead
- Total: ~$281/month base, dropping to ~$137 with Nova Pro’s 300K context enabling you to reduce calls per request from 3 to 1 for many workflows
That last point is the real unlock: Nova Pro’s 300K token context window means many “multi-step” tasks can be collapsed into a single inference call. Feed the entire conversation history, all tool results, and the full task context in one shot. One call instead of three eliminates the per-call overhead entirely.
Nova Pro’s dependencyFailedException in Agents
There’s a specific failure mode that affects Nova Pro when used as a Bedrock Agents orchestrator: dependencyFailedException. This error occurs when the agent’s orchestration loop enters a state where it can’t determine the next action — usually because Nova Pro’s response format for tool calls diverges from what the Agents runtime expects.
The Agents runtime was built and heavily tested with Anthropic Claude. Nova Pro is a first-party Amazon model, but it follows a different internal prompting convention than Claude, and the Agents ReAct template doesn’t always elicit cleanly parseable tool-call decisions from Nova Pro. The result is an unretryable exception that kills the session.
Workarounds if you’re committed to Agents + Nova Pro:
- Keep action group definitions short — under 500 tokens total. Longer tool descriptions increase the likelihood of malformed orchestration responses.
- Reduce the number of action groups per agent. An agent with 3 action groups fails less often than one with 8.
- Add session-level retry logic in your application code.
Alternatively, use Claude Sonnet 4 or Claude 3.5 Haiku as the Agents orchestrator and Nova Pro for the actual content generation. Wire a RETURN_CONTROL action that passes the task to Nova Pro via a Lambda function, then returns the result to the Agents session. You get Agents’ orchestration reliability with Nova Pro’s cost profile for the heavy LLM work.
What You Give Up Going Direct
Building on the raw Bedrock InvokeModel/Converse API means you implement yourself:
Conversation memory. Bedrock Agents manages session state across turns. In a direct API setup, you track the full message history and pass it on every call. For short conversations, this is fine — Nova Pro’s 300K context handles histories up to roughly 200K tokens before you need to think about summarization. For multi-day sessions (customer service, long-running projects), implement your own DynamoDB-backed session store.
Tool dispatch. Agents automatically invokes Lambda functions when the model requests a tool call and feeds the result back. Direct API means you write the tool dispatch loop: check if the model response includes a tool call, invoke the function, append the result, call the model again. About 40 lines of Python. It’s not complex, but you own the retry logic and error handling.
Observability and trace. Bedrock Agents provides a full trace of every orchestration step — which tool was called, what the model reasoned before calling it, how long each step took. The trace is invaluable for debugging multi-step failures. Direct API gives you CloudWatch metrics and whatever you instrument yourself.
Knowledge base integration. Agents has native Bedrock Knowledge Base integration with hybrid search (semantic + keyword). Direct API can call the retrieve API separately, but you handle the retrieval-then-generate pattern yourself.
When to Use Each
Use Bedrock Agents when:
- You need the managed trace for debugging and compliance (regulated industries often require audit trails of AI decision-making)
- Your workflow involves 5+ action groups or complex conditional branching between tools
- You’re building for an audience that needs a no-code/low-code way to extend the agent’s capabilities (Agents’ action group framework is accessible to non-developers)
- You need built-in Knowledge Base retrieval and the retrieval-augmented generation pattern is central to your use case
- Team bandwidth is limited — the orchestration engineering work is non-trivial and Agents eliminates it
Use direct Nova Pro API when:
- Request volume exceeds 500/day and cost is a primary constraint
- Your workflows are well-understood and don’t require dynamic orchestration
- You need sub-2-second response latency — Agents adds 300-800ms of orchestration overhead per step
- Your tools are few and stable (1-3 functions that don’t change often)
- You’re already using an open-source orchestration library (LangChain, LlamaIndex, custom ReAct loop) — bolt-on Agents integration on top of an existing orchestration layer creates more complexity than it removes
The Bedrock Agents and MCP DevOps guide covers the Agents path in depth, including how to wire action groups to existing infrastructure automation tools. If cost is the deciding factor, the AWS FinOps Well-Architected patterns post covers how to apply cost allocation and budgets to AI workloads at the account level. For teams building with Terraform, the Terraform MCP AI agents for infrastructure post shows a direct-API orchestration pattern that avoids the Agents overhead while still using Nova Pro for IaC generation tasks.
A Minimal Direct Orchestration Loop
If you decide to go direct, here’s a production-ready tool dispatch loop for Nova Pro:
import boto3
import json
from typing import Callable
bedrock_rt = boto3.client('bedrock-runtime', region_name='us-east-1')
TOOLS = [
{
"toolSpec": {
"name": "query_database",
"description": "Execute a read-only SQL query against the production database",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "SQL SELECT statement"}
},
"required": ["query"]
}
}
}
},
{
"toolSpec": {
"name": "get_service_status",
"description": "Get current health status of an AWS service or ECS task",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"service_name": {"type": "string"}
},
"required": ["service_name"]
}
}
}
}
]
def dispatch_tool(tool_name: str, tool_input: dict, tool_functions: dict[str, Callable]) -> str:
if tool_name not in tool_functions:
return json.dumps({"error": f"Unknown tool: {tool_name}"})
try:
result = tool_functions[tool_name](**tool_input)
return json.dumps(result) if not isinstance(result, str) else result
except Exception as e:
return json.dumps({"error": str(e)})
def run_agent(
user_message: str,
system_prompt: str,
tool_functions: dict[str, Callable],
max_iterations: int = 10
) -> str:
messages = [{"role": "user", "content": [{"text": user_message}]}]
for iteration in range(max_iterations):
response = bedrock_rt.converse(
modelId="amazon.nova-pro-v1:0",
system=[{"text": system_prompt}],
messages=messages,
toolConfig={"tools": TOOLS},
inferenceConfig={"maxTokens": 2048, "temperature": 0.1}
)
output_message = response["output"]["message"]
messages.append(output_message)
stop_reason = response["stopReason"]
# Model finished — return the text response
if stop_reason == "end_turn":
for block in output_message["content"]:
if "text" in block:
return block["text"]
# Model wants to use a tool
if stop_reason == "tool_use":
tool_results = []
for block in output_message["content"]:
if "toolUse" not in block:
continue
tool_use = block["toolUse"]
result = dispatch_tool(
tool_use["name"],
tool_use["input"],
tool_functions
)
tool_results.append({
"toolResult": {
"toolUseId": tool_use["toolUseId"],
"content": [{"text": result}]
}
})
messages.append({"role": "user", "content": tool_results})
return "Max iterations reached without a final response."
# Usage
def query_database(query: str) -> dict:
# Your actual DB implementation
return {"rows": [], "count": 0}
def get_service_status(service_name: str) -> dict:
# Your actual status check
return {"service": service_name, "status": "healthy"}
response = run_agent(
user_message="How many failed ECS tasks do we have in the last hour?",
system_prompt="You are an infrastructure assistant. Use tools to answer questions accurately.",
tool_functions={
"query_database": query_database,
"get_service_status": get_service_status,
}
)
print(response)
This loop handles the complete tool dispatch cycle, stops on natural completion, and respects a max-iterations guard. Add your own retry logic on ThrottlingException and ServiceUnavailableException around the converse call if you’re running at production throughput.
The total code here is about 80 lines. Bedrock Agents gives you this loop plus observability tooling plus Knowledge Base integration plus a console UI for non-developers. Whether that’s worth 6× the token cost depends on your team, your use case, and how much orchestration complexity you’re willing to own.
Comments