Aurora Serverless v2 + Bedrock: AI Database Queries in 2026

Written by Bits Lovers on 20 May 2026

Aurora Serverless v2 + Bedrock: AI Database Queries in 2026

I connected Bedrock to our Aurora cluster last month. The first thing I asked it was “show me all customers who churned in Q1 but came back in Q2” — in plain English. It generated the SQL, ran it through a Lambda action group, and returned a count in about four seconds. My first thought was: this is either the most useful thing we’ve built this year or a liability waiting to happen. Probably both.

That’s the honest frame for this post. Aurora Serverless v2 with Bedrock integration is genuinely capable in 2026. It also has sharp edges that people tend to gloss over when they’re excited about demos.

Aurora Serverless v2 in 2026

Before getting to the AI parts, you need to understand what Aurora Serverless v2 actually is today, because it’s meaningfully different from what it was at launch.

The fundamental model: you define a capacity range in ACUs (Aurora Capacity Units). An ACU is approximately 2 GB of memory plus proportional CPU. You set a minimum and a maximum. Aurora scales within that range automatically, in increments as small as 0.5 ACU, based on actual load.

Instant scaling is the headline improvement in the 2026 engine. Prior to v2’s maturation, scaling events could introduce brief latency spikes as the cluster caught up to demand. The current behavior scales smoothly enough that most application workloads don’t notice it. I’ve watched our cluster go from 2 ACU to 24 ACU during a batch job without a single timeout. That’s a real improvement.

Minimum ACU of 0 is available for development and test clusters. This means the cluster can pause completely when idle and resume on first connection. Resume latency is around 30 seconds for PostgreSQL-compatible Aurora. For a production database that’s unacceptable. For a dev environment you’re running to stay below a $50/month budget, it’s fine. Know which you need.

Minimum ACU of 0.5 is the practical floor for production. At 0.5 ACU the cluster stays warm and responds to connections without the cold-start penalty. If you set your minimum to 0 on a production cluster to save money, you will eventually have a 30-second connection delay when the cluster resumes after an idle period at 2 AM, and that delay will happen during your most sensitive traffic window. Don’t do it.

The pricing math: you pay per ACU-hour. In us-east-1, that’s $0.12 per ACU-hour for Aurora PostgreSQL Serverless v2. A cluster sitting at 2 ACU all month costs about $175 in compute. A cluster that peaks at 16 ACU during business hours and drops to 2 ACU overnight costs significantly less than a provisioned r6g.xlarge running 24/7. Run the numbers for your actual traffic pattern before assuming Serverless is cheaper — for steady high-throughput workloads, it often isn’t.

What Bedrock Integration Actually Enables

The integration isn’t a native Aurora feature. You don’t flip a switch in the Aurora console and get AI queries. What you’re building is a Bedrock Agent with an action group that talks to Aurora. The agent handles the language understanding; Aurora handles the data. The connection between them is your code.

Three things this unlocks:

Natural language to SQL. You give the agent your schema, it generates a query, your Lambda runs it, the results come back. The quality depends heavily on how you describe your schema to the model. A table called usr_evt_log with columns named ts, uid, and evtyp will produce worse results than a table called user_events with columns named created_at, user_id, and event_type. This is not a Bedrock problem. It’s a schema naming problem you now have to care about if you want the AI layer to work.

Semantic search on your data. Using the pgvector extension on Aurora PostgreSQL, you can store vector embeddings alongside your rows. Bedrock generates embeddings for queries using Titan Embeddings. You store embeddings for documents or records at write time. At query time, you search by cosine similarity rather than exact match. This is retrieval-augmented generation (RAG) applied to your own database content.

Query optimization suggestions. You can feed slow query logs and EXPLAIN ANALYZE output to a Bedrock model and get back optimization suggestions. This works better than you’d expect and worse than you’d hope. More on that later.

Use Case 1: Natural Language to SQL

Here’s the actual setup that works in production.

You need three things: a Bedrock Agent with a Knowledge Base containing your schema, a Lambda action group that executes parameterized queries, and guardrails that block DDL and DML (the agent should never be able to DROP a table or DELETE rows).

The schema Knowledge Base is critical. Don’t just dump your full DDL. Write plain English descriptions of what each table contains, what the key columns mean, and how tables relate to each other. Include examples of common query patterns.

Table: orders
Purpose: Records all customer purchase transactions. One row per order.
Key columns:
  - order_id (UUID): Unique identifier for the order
  - customer_id (UUID): References customers.customer_id
  - status (VARCHAR): Current order state. Values: pending, processing, shipped, delivered, cancelled, refunded
  - created_at (TIMESTAMPTZ): When the order was placed, in UTC
  - total_amount (NUMERIC): Order total in USD, before tax
  - region (VARCHAR): Customer's billing region. Values: us-east, us-west, eu, apac

Common queries:
  - "orders by status last 30 days" → GROUP BY status WHERE created_at > NOW() - INTERVAL '30 days'
  - "revenue by region" → SUM(total_amount) GROUP BY region

With schema like this in the Knowledge Base, the agent’s SQL generation accuracy is substantially better. You’re not asking it to reverse-engineer intent from column names. You’re telling it what the data means.

The Lambda action group that runs queries:

import json
import boto3
import psycopg2
from botocore.exceptions import ClientError

def get_db_credentials():
    client = boto3.client("secretsmanager")
    response = client.get_secret_value(SecretId="aurora/readonly-credentials")
    return json.loads(response["SecretString"])

def lambda_handler(event, context):
    action = event.get("actionGroup")
    api_path = event.get("apiPath")
    params = event.get("requestBody", {}).get("content", {}).get(
        "application/json", {}
    ).get("properties", [])
    
    sql = next((p["value"] for p in params if p["name"] == "sql"), None)
    
    if not sql:
        return build_response(400, {"error": "No SQL query provided"})
    
    # Enforce read-only: block any statement that isn't SELECT
    sql_stripped = sql.strip().upper()
    if not sql_stripped.startswith("SELECT") and not sql_stripped.startswith("WITH"):
        return build_response(403, {"error": "Only SELECT queries are permitted"})
    
    creds = get_db_credentials()
    
    try:
        conn = psycopg2.connect(
            host=creds["host"],
            port=creds["port"],
            database=creds["database"],
            user=creds["username"],
            password=creds["password"],
            connect_timeout=5
        )
        conn.set_session(readonly=True)
        
        with conn.cursor() as cur:
            cur.execute(sql)
            columns = [desc[0] for desc in cur.description]
            rows = cur.fetchmany(500)  # cap results
            
        conn.close()
        
        return build_response(200, {
            "columns": columns,
            "rows": rows,
            "row_count": len(rows)
        })
        
    except psycopg2.Error as e:
        return build_response(500, {"error": str(e)})

def build_response(status_code, body):
    return {
        "messageVersion": "1.0",
        "response": {
            "actionGroup": "query-aurora",
            "apiPath": "/execute-query",
            "httpStatusCode": status_code,
            "responseBody": {
                "application/json": {"body": json.dumps(body)}
            }
        }
    }

Two things in that code that are not optional: the read-only session (conn.set_session(readonly=True)) and the result cap (fetchmany(500)). Without the first, your natural language query bot can accidentally run UPDATE statements if the model gets confused about what’s being asked. Without the second, a broad query can return 200,000 rows, blow out your Lambda memory, and time out. Both have bitten me.

The result cap means your agent returns summaries or representative rows, not full result sets. That’s almost always what you want anyway. If someone asks “how many orders came in last month by region,” they want a five-row summary, not 80,000 individual order records.

Use Case 2: RAG on Database Content

This is where pgvector on Aurora PostgreSQL becomes genuinely useful.

The scenario: you have a database of support tickets, documentation articles, customer notes — text content that needs to be searched by meaning, not keyword. You want to ask “find tickets related to authentication failures on mobile” and get back the relevant tickets even if none of them contains those exact words.

First, enable the pgvector extension:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE document_embeddings (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_table VARCHAR(100) NOT NULL,
    source_id UUID NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),  -- Titan Embeddings V2 dimension
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON document_embeddings 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

The IVFFlat index is important. Without it, similarity search does a full table scan. For a few thousand documents that’s fine. For hundreds of thousands it becomes a problem. The lists parameter should be roughly the square root of your row count.

Now the embedding pipeline. When a document is created or updated, you generate an embedding via Bedrock and store it:

import boto3
import json
import psycopg2
import numpy as np

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def embed_text(text: str) -> list[float]:
    response = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps({
            "inputText": text,
            "dimensions": 1536,
            "normalize": True
        })
    )
    result = json.loads(response["body"].read())
    return result["embedding"]

def store_embedding(conn, source_table: str, source_id: str, content: str):
    embedding = embed_text(content)
    
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO document_embeddings 
                (source_table, source_id, content, embedding)
            VALUES (%s, %s, %s, %s)
            ON CONFLICT (source_table, source_id) 
            DO UPDATE SET 
                content = EXCLUDED.content,
                embedding = EXCLUDED.embedding,
                created_at = NOW()
        """, (source_table, source_id, content, embedding))
    conn.commit()

def semantic_search(conn, query: str, limit: int = 10) -> list[dict]:
    query_embedding = embed_text(query)
    
    with conn.cursor() as cur:
        cur.execute("""
            SELECT 
                source_table,
                source_id,
                content,
                1 - (embedding <=> %s::vector) AS similarity
            FROM document_embeddings
            WHERE 1 - (embedding <=> %s::vector) > 0.75
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (query_embedding, query_embedding, query_embedding, limit))
        
        columns = [desc[0] for desc in cur.description]
        return [dict(zip(columns, row)) for row in cur.fetchall()]

The <=> operator is cosine distance (lower is more similar). The 1 - (embedding <=> ...) conversion gives you cosine similarity (higher is more similar). The 0.75 threshold filters out low-confidence matches — adjust this based on your data. Too high and you miss relevant results. Too low and you get noise.

Wire this into a Bedrock Agent action group and you have a semantic search capability sitting on top of your existing Aurora data. The agent retrieves the relevant chunks, synthesizes them into a response, and cites the source records. For internal knowledge bases and support tools, this is one of the more practical AI patterns I’ve seen this year.

The cost to be aware of: Titan Embeddings V2 charges $0.00002 per 1,000 tokens of input. Embedding a support ticket corpus of 100,000 documents at an average of 300 tokens each is about $0.60 in embedding costs. Storage in Aurora depends on your data volume — a vector(1536) column is 6 KB per row, so 100,000 rows adds about 600 MB to your storage footprint.

Use Case 3: Automated Query Optimization Suggestions

This one I have the most mixed feelings about.

The setup: you pull slow query logs from Aurora’s Performance Insights, extract queries above a latency threshold, run EXPLAIN ANALYZE on them, and pass the output to a Bedrock model asking for optimization suggestions.

The model is good at:

Identifying missing indexes from sequential scan patterns
Spotting N+1 query patterns in application logs
Explaining what an EXPLAIN ANALYZE output actually means in plain English
Suggesting query rewrites that eliminate redundant subqueries

The model is less reliable at:

Understanding your data distribution (cardinality assumptions can be wrong)
Knowing which indexes you’ve already tried and rejected
Accounting for concurrent workloads and lock contention
Suggesting Aurora-specific configuration tuning (shared_buffers, work_mem)

The safe way to use this: treat the output as a checklist of things to investigate, not a list of changes to apply. I had the model suggest adding a composite index on (user_id, created_at) for a frequently-run query. Reasonable suggestion. What it didn’t know was that we’d already tried that index and removed it because it was causing write contention on a high-insert table. The model can’t know that. Your team does.

Performance Insights is the right data source for this. It captures query text, execution counts, wait events, and average latency per query shape. Feed those metrics alongside EXPLAIN ANALYZE output for richer context:

def get_slow_queries(cluster_id: str, hours: int = 24) -> list[dict]:
    pi = boto3.client("pi", region_name="us-east-1")
    
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=hours)
    
    response = pi.get_resource_metrics(
        ServiceType="RDS",
        Identifier=f"db:{cluster_id}",
        MetricQueries=[{
            "Metric": "db.load.avg",
            "GroupBy": {
                "Group": "db.sql",
                "Dimensions": ["db.sql.statement"],
                "Limit": 25
            }
        }],
        StartTime=start_time,
        EndTime=end_time,
        PeriodInSeconds=3600
    )
    
    return response.get("MetricList", [])

Pass this to Bedrock with your EXPLAIN ANALYZE output and ask for specific, actionable suggestions. Vague prompts get vague answers. “What can I do to improve this query?” gets worse results than “This query runs in 4.2 seconds on average. It’s doing a sequential scan on orders (8M rows). What index would most likely reduce the scan cost, and are there any query rewrites worth considering?”

Cost Model: What You’re Actually Paying

Let me put real numbers together for a mid-size production workload.

Aurora Serverless v2 costs (us-east-1, Aurora PostgreSQL):

ACU-hour: $0.12
Storage: $0.10/GB-month
I/O: $0.20 per million requests

A cluster scaling between 2-8 ACU during business hours, 2 ACU overnight:

Roughly 6 ACU average × 24h × 30 days = 4,320 ACU-hours
Compute: ~$518/month
Storage (100GB): $10/month
Total Aurora: ~$530/month

Compare to a db.r6g.xlarge (4 vCPU, 32 GB) at $0.48/hour: $346/month. Serverless v2 is more expensive for that steady-state scenario. Where it wins is variable workloads — a team whose database gets 10x more traffic on weekdays than weekends, or batch jobs that spike heavily and then drop to near zero.

Bedrock costs (Claude 3.5 Sonnet):

Natural language query interface, 20 queries/day, ~4K tokens each: ~$70/month
Embedding pipeline, 1,000 new documents/day at 300 tokens: $1.80/month
Query optimization suggestions, 5 analyses/week: ~$8/month

Total Bedrock overhead for these use cases: under $100/month. Not nothing, but not the dominant cost. Aurora compute is.

Aurora vs RDS vs DynamoDB for AI Workloads

The decision framework I use:

Choose Aurora Serverless v2 when:

You need pgvector for semantic search and you want it colocated with your transactional data
Your workload is unpredictable (batch jobs, dev environments, bursty analytics)
You want SQL joins between your embeddings and your business data
You’re already on PostgreSQL and don’t want to manage another data store

Choose RDS PostgreSQL when:

Your workload is steady and predictable (provisioned instances have better price/performance for constant load)
You need the full range of PostgreSQL extensions, some of which aren’t yet available on Aurora
You want blue-green deployments with minimal risk — see RDS Blue-Green Deployment for how we do zero-downtime schema migrations with that pattern

Choose DynamoDB when:

You need single-digit millisecond latency at any scale
Your access patterns are well-defined and key-based
You don’t need complex joins or vector search
Global table replication is a requirement

DynamoDB and Bedrock is a different integration story — you’re typically using DynamoDB as a session store or metadata store for agents, not as the primary search surface. For key-value lookups at high throughput, nothing beats it. See the DynamoDB 2026 patterns guide for the current best practices on table design and on-demand vs. provisioned capacity.

What about ElastiCache? If your semantic search results are relatively static (documents don’t change often), caching embedding results in ElastiCache can dramatically reduce both Bedrock API calls and Aurora query load. The ElastiCache 2026 guide covers the caching patterns worth applying here — particularly the write-through pattern for embedding caches where freshness matters.

For high-availability requirements across all of this: Aurora Serverless v2 supports Multi-AZ out of the box. The failover behavior for Serverless clusters is the same as provisioned — typically 30-60 seconds. If you’re building an AI query interface that’s customer-facing, design for that gap with appropriate timeouts and fallback messaging.

Monitoring AI Query Patterns with Performance Insights

The interesting monitoring challenge with AI-generated SQL is that the query shapes are unpredictable. A human engineer writes the same ten queries 90% of the time. An AI query interface generates novel SQL on every request.

Performance Insights handles this through query normalization — it groups queries with the same structure regardless of specific parameter values. That helps. But you’ll also see queries you’ve never seen before appearing in your slow query digest, and you need a process for reviewing them.

What to watch in Performance Insights when you add a Bedrock query interface:

DB load by SQL — Watch for new query shapes that didn’t exist before you enabled the AI interface. They’ll show up here. If you see a query doing a sequential scan on a large table, the AI generated something without a good index path.

Wait events — io:FcTabFileRead increasing suggests more I/O than before. Lock:Relation appearing suggests the AI is generating queries that hit contention with concurrent writes. Neither is guaranteed to be a problem, but both are worth investigating.

Top SQL by average latency — Set an alert if any query shape exceeds 1 second average. AI-generated queries won’t stay within the performance boundaries your team implicitly knows to respect.

Add a query tagging convention to your Lambda action group so you can filter Performance Insights by AI-originated queries specifically:

/* bedrock-agent:natural-language-query */
SELECT region, COUNT(*) as order_count, SUM(total_amount) as revenue
FROM orders
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY region
ORDER BY revenue DESC

That comment survives through to Performance Insights and lets you filter the dashboard to only show queries that came from the AI interface. Worth adding from day one.

When NOT to Use Aurora Serverless

This section is the one most posts skip.

Steady high-throughput OLTP workloads. If your database is running at 60-80% capacity consistently throughout the day, provisioned Aurora or RDS will be cheaper. Serverless v2’s per-ACU pricing is optimized for variable workloads. For flat-load applications, you’re paying the flexibility premium without getting the benefit.

Sub-millisecond latency requirements. Serverless v2 scales smoothly but it’s not magic. During a rapid scale-up event, individual queries can see slightly elevated latency as capacity is added. For most workloads this is imperceptible. For financial trading systems or latency-SLA-critical APIs, it’s a risk you need to evaluate.

Maximum ACU needed is predictable and consistent. If you know you always need exactly 16 ACU and that never changes, provision 16 ACU on a standard instance. You’ll pay less and have more predictable behavior.

When your team needs fine-grained PostgreSQL tuning. Aurora abstracts some of the configuration knobs that experienced DBAs rely on for performance tuning. If you have engineers who tune shared_buffers, effective_cache_size, and autovacuum settings regularly, provisioned RDS gives you more direct control.

The Actual Decision

After a month of running Bedrock on top of Aurora Serverless v2, my honest summary:

The natural language query interface is genuinely useful for internal analytics tools where business users need to ask ad-hoc questions without writing SQL. It’s not suitable for customer-facing features where you need tight guarantees about query behavior. The read-only guard and result caps are non-negotiable.

The semantic search via pgvector is the most technically interesting piece. Colocating your vector embeddings with your relational data in one Aurora cluster, with SQL joins between them, is a genuinely different capability than maintaining a separate vector database. The operational simplicity is real.

The query optimization suggestions are useful as a starting point, not an endpoint. Treat the model’s recommendations as hypotheses that require validation against your actual workload, not instructions to execute blindly.

Aurora Serverless v2 in 2026 is mature enough for production. The scaling behavior is reliable. The cold-start problem at minimum ACU of 0.5 is solved. The pgvector support is production-grade. If your workload is variable and you’re already on PostgreSQL, it’s the right default choice.

The Bedrock Agents DevOps guide covers the agent infrastructure that connects all of this together — if you want to go beyond the natural language query interface into full agentic workflows that coordinate multiple AWS services, that’s the right next read.

Start small. Deploy pgvector, embed one document corpus, wire up semantic search. You’ll learn more about what works for your data from one real implementation than from any number of architecture diagrams.