Hybrid RAG on AWS: Amazon Bedrock and OpenSearch That Hold Up in Production

Bits Lovers
Written by Bits Lovers on
Hybrid RAG on AWS: Amazon Bedrock and OpenSearch That Hold Up in Production

On March 1, 2024, AWS added hybrid search to Knowledge Bases for Amazon Bedrock for Amazon OpenSearch Serverless. On March 27, 2025, AWS added Amazon OpenSearch Managed Cluster as a supported Bedrock knowledge base vector store. On April 10, 2025, AWS extended Bedrock hybrid search to Aurora PostgreSQL and MongoDB Atlas vector stores. Those dates matter because the current production answer is not “Bedrock does hybrid RAG” in the abstract. The real question is which part Bedrock manages, which part OpenSearch manages, and where the integration boundaries still matter.

As of April 10, 2026, the current Bedrock user guide documents overrideSearchType=HYBRID for Amazon RDS, Amazon OpenSearch Serverless, and MongoDB vector stores that contain a filterable text field. AWS separately documents OpenSearch Managed Cluster as a supported Bedrock vector store, but not in that current hybrid-search note. That is enough ambiguity that I would not promise a fully Bedrock-managed hybrid path on Managed Cluster without validating it in the exact Region and account you plan to ship.

That sounds annoying, but the engineering answer is still straightforward:

  • If you want the fastest managed path, use Bedrock Knowledge Bases with OpenSearch Serverless.
  • If you want deeper ranking control, OpenSearch-native hybrid search should own retrieval and Bedrock should own generation.
  • If you want Bedrock Knowledge Bases with OpenSearch Managed Cluster, treat it as a vector-store integration first and confirm the retrieval behavior before you standardize on it.

Why Pure Semantic Search Still Loses in Production

Hybrid RAG survives contact with real systems because production questions are rarely clean semantic paraphrases.

Engineers ask for:

  • incident IDs
  • service names
  • exact error strings
  • feature flags
  • version numbers
  • AWS resource names

Embeddings are useful for intent. They are not magic for identifiers. A question like “why did checkout-api start timing out after build 2026.04.09.3?” has two different retrieval problems at once: semantic meaning and exact-token matching. If your retriever drops either side, the model gets weak context and the answer quality collapses.

That is why OpenSearch is so useful in RAG systems. You can keep lexical retrieval and vector retrieval in the same search layer instead of pretending one ranking strategy fits every query. If your team mostly knows OpenSearch from the observability guide, this is the same platform solving a different problem: not logs first, retrieval first.

The Current AWS Shape

There are really two valid architectures here.

Path 1: Bedrock-managed retrieval

Bedrock Knowledge Bases handles ingestion, chunking, embeddings, retrieval, and optional response generation. For many teams that is the right default. The Bedrock Retrieve and RetrieveAndGenerate APIs support overrideSearchType, and Bedrock can quick-create an OpenSearch Serverless vector store for you.

This is the path I would use when:

  • the team wants fewer moving parts
  • metadata filters are enough
  • retrieval relevance needs to be good, not endlessly customizable
  • the application is more important than the search platform

Path 2: OpenSearch-managed retrieval

OpenSearch owns lexical plus vector search, score normalization, and reranking behavior. Bedrock is then just the model endpoint. This is the better design when search quality is a core product capability, not just supporting infrastructure.

This is the path I would use when:

  • exact-match recall is critical
  • you want explicit weighting between lexical and neural clauses
  • you need index-level control, search pipelines, or custom query composition
  • you already run OpenSearch seriously and want RAG to fit that operating model

The Architecture Trap Most Teams Miss

The most important production detail is not about embeddings at all. It is network design.

The current Bedrock setup documentation for OpenSearch Managed Cluster says the OpenSearch domain must use public access for a Bedrock knowledge base. VPC-only OpenSearch domains are not supported for that Bedrock knowledge base integration path. For a lot of regulated environments, that single line is enough to change the architecture decision.

If you need private connectivity and still want Bedrock-managed retrieval, OpenSearch Serverless is a cleaner fit because Bedrock documents a VPC-endpoint path there. If you need the full managed-cluster feature set inside a private network boundary, the safer pattern is usually custom retrieval from your application tier into OpenSearch, followed by a Bedrock model call for answer generation.

That separation also fits how I think about model operations in general. The model is one component. Retrieval is another. If you want the model side examples, Bedrock Agents for DevOps and Bedrock model lifecycle are better places to think about orchestration and model version control. Do not use model changes to hide retrieval weaknesses.

Bedrock Knowledge Bases Example

If you want the managed path, keep it simple and explicit:

import boto3

runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = runtime.retrieve(
    knowledgeBaseId="KB12345678",
    retrievalQuery={"text": "why did checkout latency spike after deploy 2026.04.09.3"},
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults": 8,
            "overrideSearchType": "HYBRID"
        }
    }
)

for item in response["retrievalResults"]:
    print(item["score"], item["location"])

This is the right place to start when your requirements are mostly:

  • reliable chunk retrieval
  • metadata filtering
  • citations
  • low operational overhead

The current Bedrock docs also call out a practical constraint: hybrid search only applies when the vector store includes a filterable text field. That detail is easy to miss and it directly affects whether keyword-style retrieval can participate in the query.

OpenSearch-Native Hybrid Example

When you want more ranking control, switch retrieval into OpenSearch.

OpenSearch’s current hybrid search documentation uses a search pipeline plus a hybrid query. The important part is not the syntax. The important part is that lexical and neural scores are normalized and combined in an explicit pipeline that you control.

PUT /_search/pipeline/rag-hybrid-pipeline
{
  "description": "Hybrid pipeline for production RAG",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [0.45, 0.55]
          }
        }
      }
    }
  ]
}
GET /rag-docs/_search?search_pipeline=rag-hybrid-pipeline
{
  "_source": {
    "exclude": ["passage_embedding"]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "passage_text": {
              "query": "checkout latency spike after deploy 2026.04.09.3"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "checkout latency spike after deploy 2026.04.09.3",
              "model_id": "your-model-id",
              "k": 20
            }
          }
        }
      ]
    }
  }
}

That is what I would run when product search relevance matters enough that I want to tune weights, test ranking changes, and keep search behavior versioned separately from prompt behavior.

The Production Rules That Actually Matter

A few engineering rules matter much more than choosing the perfect model:

Keep a real text field.
Hybrid retrieval only works if lexical search has something useful to index. Do not throw away identifiers, titles, filenames, headings, or metadata-rich text during chunking.

Use metadata deliberately.
Environment, service, team, document type, and last-updated date often do more for RAG quality than swapping one FM for another. Bedrock’s filtering and OpenSearch’s query clauses are both more predictable than hoping the model infers scope from prose.

Evaluate retrieval separately from generation.
Measure top-k recall and citation usefulness before you change prompts or models. If retrieval is weak, Bedrock Nova fine-tuning is not the first fix.

Pick your vector store based on operating model, not hype.
Bedrock documents that OpenSearch Serverless and OpenSearch Managed Cluster are the only supported Bedrock vector stores for binary embeddings. That matters if storage footprint and retrieval scale are starting to dominate cost. If you are evaluating retrieval layers more broadly, the detailed S3 Vectors vs Gemini File Search comparison is the right next read because the real decision is often not “which AWS vector store?” but “do I want a reusable vector platform or a model-coupled retrieval tool?”

Do not ignore ingestion freshness.
A stale index makes even a good hybrid retriever look bad. If your source system is batch-oriented, the AWS Glue ETL guide is closer to the ingestion pattern you need. If updates are event-driven, the AWS Kinesis guide is a better mental model.

What I Would Deploy

For most internal AI search systems, I would use this decision rule:

Use Bedrock Knowledge Bases + OpenSearch Serverless when you need:

  • fast time to value
  • managed ingestion
  • citations
  • good hybrid retrieval without a search team

Use OpenSearch-native hybrid retrieval + Bedrock model inference when you need:

  • explicit weighting between lexical and neural search
  • tighter control over ranking behavior
  • private network patterns that do not fit the Bedrock managed integration
  • retrieval quality as a product differentiator

Use Bedrock Knowledge Bases + OpenSearch Managed Cluster only when you have a clear reason for Managed Cluster specifically and you have validated the current retrieval behavior in your own environment.

Final Take

Hybrid RAG on AWS is mature enough to be useful, but not simple enough that the service names alone tell you the right design. Bedrock is best when you want managed RAG workflows. OpenSearch is best when retrieval itself is the engineering surface you need to control.

The mistake is trying to make one service do both jobs equally well for every case. Keep retrieval explicit, keep generation separate, and let the architecture match the thing you actually need to optimize.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus