Amazon S3 Vectors vs Gemini File Search: Two Very Different Answers to the Same RAG Problem
AWS rolled out S3 Vectors in preview on July 15, 2025. Google put Gemini File Search into public preview on November 6, 2025. That changed the retrieval conversation. A year earlier, most teams were still starting with “which vector database are we going to run?” Now the first question is usually different: which part of retrieval do we actually want to own ourselves?
That is the comparison that matters. Treat S3 Vectors and Gemini File Search like equivalent services and you will end up optimizing the wrong layer. They do aim at the same business problem. You want a model to answer from private documents instead of making things up. But the engineering surface is very different.
- Amazon S3 Vectors gives you vector storage plus query APIs.
- Gemini File Search gives Gemini a managed retrieval tool inside the generation flow.
There is one more wrinkle. A lot of people say “Gemini file store” when they really mean the raw Files API. That is not the same thing. Google’s current docs make the distinction pretty clear: File API uploads are temporary, while File Search stores are the persistent retrieval container. So the real production comparison is Amazon S3 Vectors vs Gemini File Search stores, not S3 Vectors vs a short-lived file upload.
After going through the current docs, limits pages, changelogs, and pricing tables on April 11, 2026, my view is straightforward:
- Use S3 Vectors when retrieval is a platform concern and you want a durable, cheap vector layer you can integrate with multiple models and workflows.
- Use Gemini File Search when your application already lives in Gemini and you want the fastest path from documents to grounded answers with the fewest moving parts.
- Do not use either just because the product page looks clean. The wrong retrieval layer becomes technical debt fast.
The Real Difference in One Sentence
S3 Vectors gives you a place to store and search embeddings. Gemini File Search gives Gemini a built-in way to retrieve from documents during generation.
That sounds subtle. It is not.
With S3 Vectors, you still need to think about embedding generation, chunking strategy, metadata shape, query orchestration, reranking if you need it, and how generation happens after retrieval. If you are already comfortable with the production hybrid RAG trade-offs on AWS, that operating model will feel natural.
With Gemini File Search, Google handles much more of the retrieval pipeline for you. You create a File Search store, upload or import documents, let the system chunk and embed them, and then call generateContent with a File Search tool attached. That is a much higher-level abstraction. It is also a tighter dependency on one model stack.
If You Remember Only Three Facts, Make It These
- S3 Vectors is built for huge vector scale. AWS documents up to 2 billion vectors per index, 10,000 indexes per vector bucket, and dimensions from 1 to 4,096.
- Gemini File Search is built for managed retrieval convenience. Google documents 100 MB maximum per document, project storage caps by tier, and recommends keeping each File Search store under 20 GB for optimal retrieval latency.
- Their pricing models are not comparable unless you split retrieval cost from model-token cost. S3 charges for storage, PUT, and query processing. Gemini File Search charges for embeddings at indexing time, keeps storage free, and bills retrieved document tokens as normal model input tokens.
If your team misses point three, the cost analysis will be fiction.
Side-by-Side: What the Official Docs Actually Say
| Dimension | Amazon S3 Vectors | Gemini File Search |
|---|---|---|
| Product layer | Vector storage and query API | Built-in retrieval tool for Gemini |
| Persistence | Durable S3-backed vector storage | File Search store persists until deleted |
| Temporary input path | Not relevant, vectors are the persistent object | Raw File API objects are temporary and the docs say they are deleted after 48 hours |
| Maximum scale surface | Up to 2 billion vectors per index, 10,000 indexes per bucket | 100 MB per document, project-level caps from 1 GB to 1 TB depending on tier |
| Metadata | Up to 40 KB total metadata per vector, with up to 2 KB filterable metadata | Supports custom metadata and metadata filtering in retrieval |
| Query model | Query vectors directly with filters and top-k | Ask Gemini with File Search attached as a tool |
| Storage pricing | $0.06 per GB-month in AWS pricing example | Storage is free |
| Query pricing | $2.50 per million Query API calls plus data processed | Retrieved document tokens billed as normal input tokens for the selected Gemini model |
| Best fit | Shared retrieval layer, multi-model systems, large corpora, infrequent queries | Gemini-native apps, rapid delivery, managed retrieval with low ops burden |
Graph 1: Control vs Convenience
This graph is opinionated. It is not vendor marketing. It is how these products feel in the hands of an engineer building a production retrieval layer.
This is the trade. S3 Vectors gives you more leverage. Gemini File Search gives you less plumbing.
Amazon S3 Vectors: What You Are Actually Buying
The current S3 Vectors feature page says the service is designed to store up to billions of vectors with sub-second query performance. AWS also publishes specific numbers on the product page: up to 2 billion vectors per index, up to 10,000 indexes per bucket, and a lowest warm-query latency figure of 100 milliseconds. The limitations page adds the operational detail that most architects actually care about:
- up to 2 billion vectors per index
- up to 10,000 vector indexes per bucket
- 1 to 4,096 dimensions per vector
- up to 40 KB of total metadata per vector
- up to 500 vectors per
PutVectorscall - up to 100 top-k results per
QueryVectorsrequest
That is not a toy service. It is a storage system designed for very large corpora.
Just as important is the positioning AWS uses in its own docs. S3 Vectors is a fit for semantic search, RAG, agent memory, and tiered retrieval. AWS is explicit that OpenSearch still owns the high-QPS, low-latency side when you need real-time search at higher throughput. That is a good sign, not a weakness. It means the product has a clear place in the stack.
I would describe S3 Vectors like this: cheap, durable semantic storage that lets you stop pretending all vectors need premium query infrastructure all the time.
That matters when your vector corpus grows much faster than your query rate. Old support cases, PDFs, runbooks, video embeddings, large archives of logs transformed into incident summaries, chat transcripts for agent memory, compliance evidence stores, all of that tends to accumulate faster than it gets queried.
If your retrieval architecture already looks like a platform, S3 Vectors is attractive because it is not trying to become your entire application. It stores vectors. It filters on metadata. It returns neighbors. You decide what happens next.
Gemini File Search: What You Are Actually Buying
Gemini File Search lives much closer to the model.
The current Google docs show a very direct flow:
- create a File Search store
- upload or import documents into it
- let Google chunk and embed the content
- call
generateContentwithfile_searchconfigured as a tool
The docs are unusually clear on an important lifecycle detail. They say the temporary File object created by uploadToFileSearchStore is deleted after 48 hours, while the data imported into the File Search store is stored indefinitely until you delete it. That sentence clears up a common confusion: the raw Files API is not your persistent retrieval layer. The File Search store is.
Google also documents a strong set of practical limits:
- maximum file size per document: 100 MB
- project File Search store capacity by tier: 1 GB free, 10 GB Tier 1, 100 GB Tier 2, 1 TB Tier 3
- recommended maximum per store: under 20 GB for optimal retrieval latency
- backend size accounting is typically about 3x the original input size because embeddings are stored with the content
- File Search cannot currently be combined with some other built-in tools such as Google Search and URL Context in the same call
- File Search is not supported in the Live API
Those are not deal-breakers. They are simply the signs of a managed product optimized for convenience, not for becoming your universal retrieval backbone.
The feature that makes Gemini File Search attractive is not raw scale. It is workflow compression. You are compressing ingestion, chunking, embedding, store management, retrieval, and answer generation into one model-centered path. For teams shipping Gemini-native apps, that can cut a lot of time.
The Pricing Comparison That Does Not Lie
This is where most comparisons go off the rails.
AWS publishes S3 Vectors prices as storage, PUT, and query processing. Google documents File Search pricing as embeddings at indexing time, free storage, free query embeddings, and then normal context-token charges for retrieved content. One product monetizes the retrieval substrate. The other monetizes the model interaction around retrieval.
Official S3 Vectors Example
AWS’s own pricing page includes a concrete example for 10 million vectors split across 40 indexes, with 1 million queries per month in us-east-1:
- storage: $3.54/month
- PUT: $1.97/month
- query: $5.87/month
- total: $11.38/month
For a much larger scenario with 400 million vectors and 10 million queries per month, AWS’s example totals $1,217.29/month.
That is very cheap for what it is. But remember what “it” is: vector storage and retrieval, not the full Gemini-or-Bedrock-style generation experience.
Official Gemini File Search Pricing
Google’s File Search pricing page says:
- embeddings at indexing time are charged at $0.15 per 1M tokens
- storage is free
- query-time embeddings are free
- retrieved document tokens are billed as normal context tokens under the chosen Gemini model
That last line is the important one. The retrieval tool itself looks cheap, but your real recurring bill is tied to how many retrieved tokens you feed into the model and which Gemini model you pick.
Normalized Example: 100,000 Documents
To make this concrete, assume:
- 100,000 documents
- 1,500 tokens per document
- chunked into 200-token chunks
- about 750,000 retrieval chunks total
- 1 million queries per month
- 2,000 retrieved tokens per query on average
Using AWS’s published S3 Vectors price formula for a 1,024-dimension vector with the same metadata assumptions as the official pricing example, the rough storage and retrieval side looks like this:
- S3 Vectors storage: about $0.26/month
- one full upload: about $0.88
- S3 query cost at 1M queries: about $10.69/month
Using Google’s published File Search and Gemini pricing:
- File Search indexing: 150M tokens x $0.15 / 1M = $22.50 one time
- File Search storage: $0
- retrieved-token cost at 1M queries:
- with
gemini-3.1-flash-lite-previewinput pricing: about $500/month - with
gemini-3-flash-previewinput pricing: about $1,000/month - with
gemini-3.1-pro-previewinput pricing: about $4,000/month
- with
That sounds like a knockout punch for S3 Vectors. It is not. It just proves the two systems bill different things.
S3 Vectors is cheaper because it is not your generation layer. You still need embeddings generation and a model call after retrieval. Gemini File Search folds retrieval into the request path to Gemini, so the retrieved context becomes part of the model bill. You are paying for convenience and tight integration, not only for storage.
Graph 2: Normalized Monthly Retrieval Economics
The bars below use that normalized example. They are directional, not a vendor quote.
The lesson is not “Google is expensive.” The lesson is this:
If your app issues lots of queries and returns lots of retrieved context, Gemini File Search cost is dominated by model input tokens. If your app stores a huge corpus but queries it modestly, S3 Vectors economics are hard to ignore.
How Each Path Looks in Real Systems
Path A: S3 Vectors as retrieval infrastructure
This is the better fit when you want retrieval to be reusable across more than one model or more than one application.
The basic workflow is:
- generate embeddings with your chosen model
- store them in S3 Vectors with filterable metadata
- query nearest neighbors
- pass selected chunks into a generation model
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
s3vectors = boto3.client("s3vectors", region_name="us-east-1")
body = json.dumps({"inputText": "How do I rotate database credentials safely?"})
embedding = json.loads(
bedrock.invoke_model(
modelId="amazon.titan-embed-text-v2:0",
body=body,
)["body"].read()
)["embedding"]
result = s3vectors.query_vectors(
vectorBucketName="docs-prod",
indexName="runbooks",
queryVector={"float32": embedding},
topK=8,
filter={"service": "database", "env": "prod"},
returnMetadata=True,
)
This is clean. It is also your responsibility. You own chunking, ranking policy, token budgeting, and how the model sees the retrieved context. If you are building agent workflows that need retrieval plus tools plus operational guardrails, that control is often worth it.
Path B: Gemini File Search inside the generation call
This is the better fit when you want the shortest path from uploaded docs to grounded Gemini answers.
from google import genai
from google.genai import types
client = genai.Client()
store = client.file_search_stores.create(
config={"display_name": "support-kb"}
)
operation = client.file_search_stores.upload_to_file_search_store(
file="support-handbook.pdf",
file_search_store_name=store.name,
config={"display_name": "support-handbook"}
)
while not operation.done:
operation = client.operations.get(operation)
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="What is our escalation path for production incidents?",
config=types.GenerateContentConfig(
tools=[
types.Tool(
file_search=types.FileSearch(
file_search_store_names=[store.name]
)
)
]
)
)
That is a much shorter path. It is exactly why File Search will be attractive to a lot of teams.
The catch is that the convenience comes with tighter coupling:
- your retrieval path is model-centric
- your cost grows with retrieved tokens fed into Gemini
- your tool-combination options are narrower than a custom retrieval pipeline
The Gotchas That Matter More Than the Marketing
1. S3 Vectors is not a full RAG platform
This seems obvious, but it gets missed. S3 Vectors does not eliminate the rest of the retrieval architecture. It gives you a much cheaper and larger vector substrate. You still need the rest of the pipeline. If you want fully managed answer generation on AWS, the closer comparison is not S3 Vectors vs Gemini File Search. It is S3 Vectors plus Bedrock Knowledge Bases vs Gemini File Search.
2. Gemini File Search is not just “free retrieval”
Storage is free. Query embeddings are free. That makes the product feel inexpensive at first glance. But the retrieved chunks are still billed as context tokens in the model call. If your prompts routinely pull back a lot of context, that becomes the real bill quickly.
3. Raw Gemini Files API is easy to misunderstand
The raw Files API has an expirationTime field in the API reference, and the File Search docs explicitly explain that temporary file objects created during upload get deleted after 48 hours. If you build around raw files and assume they are your long-term corpus, you will eventually rebuild the system.
4. S3 Vectors shines when query volume is not extreme
AWS says this outright in the product positioning. S3 Vectors is ideal for large, long-term vector data that does not need the high-throughput characteristics of an in-memory vector database. That makes it a strong fit for long-tail retrieval, archival corpora, and agent memory. It is not the thing I would reach for first if my core business metric depends on ultra-fast, high-QPS search on hot data.
5. Gemini File Search store size guidance is easy to ignore until latency gets weird
Google recommends keeping each File Search store under 20 GB for optimal retrieval latency. That is the sort of line teams skip over in week one and rediscover in month three. If your corpus is growing quickly, plan your partitioning early.
When to Use Which
Use Amazon S3 Vectors when:
- you want a persistent retrieval layer that is not tied to one model vendor
- you expect very large corpora and relatively modest query rates
- you need explicit control over embeddings, metadata, and query orchestration
- you are already building on AWS and want S3, Bedrock, and OpenSearch to work together
- your roadmap includes multiple retrieval consumers, not just one Gemini app
Use Gemini File Search when:
- your application already lives inside the Gemini API
- your main priority is delivery speed, not retrieval-pipeline customization
- your corpus fits cleanly inside the documented File Search limits
- you want built-in retrieval without standing up a separate vector layer
- your team would rather tune prompts and store structure than operate retrieval infrastructure
Do not use S3 Vectors when:
- you actually need a fully managed end-to-end RAG workflow with minimal engineering effort
- your retrieval layer must serve hot, high-QPS search with very tight latency requirements
- your team is not prepared to own chunking, embedding lifecycle, and retrieval orchestration
Do not use Gemini File Search when:
- retrieval needs to be shared across different model providers
- you need very large persistent corpora with broad platform reuse
- you need tighter control over ranking, multi-stage retrieval, or custom orchestration
- your expected recurring cost is driven by huge volumes of retrieved context tokens
My Decision Framework
If I were building an internal enterprise knowledge assistant today, I would ask these questions in order:
- Do I want retrieval to be a reusable platform service or a feature inside one model path?
- Is my corpus growth rate higher than my query rate?
- Do I want the cheapest possible vector substrate, or the shortest possible path to grounded answers?
- Will I likely change models in the next year?
If the answer pattern is platform, scale, cost efficiency, and portability, I would choose S3 Vectors and build retrieval as infrastructure.
If the answer pattern is Gemini app, fast implementation, managed retrieval, and limited ops surface, I would choose Gemini File Search.
If the team cannot answer those questions clearly, I would start with Gemini File Search for speed or S3 Vectors for platform reuse, but I would not pretend the choice is reversible without migration work. This is the same lesson you see when comparing model-coupled retrieval with database-centered RAG on Aurora and pgvector: the retrieval layer becomes part of the product architecture much earlier than most teams expect.
Final Take
Amazon S3 Vectors and Gemini File Search both help a model answer questions from private data. That is where the similarity ends.
S3 Vectors is the better answer when you need a durable, cheap, scalable vector foundation. Gemini File Search is the better answer when you need managed retrieval inside Gemini with minimal ceremony. If you choose between them as if they were the same abstraction, you will optimize the wrong thing.
Official References
- AWS News Blog, “Introducing Amazon S3 Vectors: First cloud storage with native vector support at scale (preview)” - https://aws.amazon.com/blogs/aws/introducing-amazon-s3-vectors-first-cloud-storage-with-native-vector-support-at-scale/
- AWS product page, “Amazon S3 Vectors” - https://aws.amazon.com/s3/features/vectors/
- AWS pricing page, “S3 Vectors pricing” - https://aws.amazon.com/s3/pricing/
- AWS docs, “Limitations and restrictions” for S3 Vectors - https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-limitations.html
- Google AI for Developers, “File Search” - https://ai.google.dev/gemini-api/docs/file-search
- Google AI for Developers, “Using files” API reference - https://ai.google.dev/api/files
- Google AI for Developers, “Gemini Developer API pricing” - https://ai.google.dev/gemini-api/docs/pricing
- Google AI for Developers, “Release notes” - https://ai.google.dev/gemini-api/docs/changelog
Comments