AWS Lambda S3 Files: Mount S3 as a File System and Kill the /tmp Copy Loop

Written by Cleber Rodrigues

AWS Lambda S3 Files: Mount S3 as a File System and Kill the /tmp Copy Loop

On April 21, 2026, AWS Lambda removed one of the dumbest pieces of plumbing in a lot of serverless code: the “download from S3, write to /tmp, process, upload to S3” loop. Lambda functions can now mount Amazon S3 buckets through Amazon S3 Files and use normal file operations against a local path.

That sentence sounds small until you count the wrappers sitting in real repositories. I have written this exact shape for image jobs, CSV cleanup jobs, one-off reporting jobs, and “temporary” data pipelines that somehow stayed in production for three years.

s3.download_file(bucket, key, "/tmp/input.csv")
process("/tmp/input.csv", "/tmp/output.csv")
s3.upload_file("/tmp/output.csv", bucket, result_key)

That pattern worked. It also leaked complexity everywhere: ephemeral storage sizing, retries around half-written output, duplicate object naming, cleanup under /tmp, memory pressure, and extra code just to make a file look like a file again.

With S3 Files mounted into Lambda, the code can become boring in the best way:

process(
    "/mnt/workspace/input/customer-export.csv",
    "/mnt/workspace/output/normalized-customers.csv"
)

The bucket is still S3. The function just sees a file system path.

The bigger story is not only file processing. It is stateful serverless work. AWS explicitly calls out AI and machine learning workloads where agents need persistent memory and shared state across pipeline steps. Pair S3 Files with Lambda durable functions, and Lambda starts to look less awkward for a class of workflows I used to push toward containers by default: multi-step work where the control plane checkpoints progress while workers share a real workspace.

That is the architectural shift. S3 is no longer just the object API at the edges of a Lambda pipeline. It can be the shared working directory in the middle.

What Actually Changed

AWS announced S3 Files first on April 7, 2026, then added direct Lambda support on April 21. S3 Files presents objects from a general purpose S3 bucket as files and directories through a file system interface. Lambda can now attach that file system and mount it under /mnt/....

The old Lambda/S3 pattern looked like this:

Old Lambda S3 copy loop compared with a Lambda S3 Files mount

Before this feature, Lambda had three common choices when code wanted file semantics:

Pattern	What it solved	What hurt
S3 API calls	Durable object storage, simple triggers, low operational burden	Application code had to translate object operations into file operations
`/tmp` storage	Local file path for libraries that expect files	Ephemeral, per-execution-environment, limited by configured size, not shared
EFS mount	Shared file system across functions	Separate file system lifecycle, separate storage location, more design work

S3 Files changes the middle of that table. The authoritative data stays in S3, but Lambda code can use normal reads and writes through a mount path.

That matters for libraries and tools that are not written for object storage. Think about image libraries, PDF processors, shell tools, scanners, repository analyzers, parquet tools, local embedding pipelines, and agents that expect a workspace. You can keep using file paths instead of wrapping every operation in getObject, putObject, multipart upload code, or temporary file cleanup.

The Important Mental Model

Do not think of this as “Lambda has infinite /tmp now.” That is the wrong model.

Think of it as an S3-backed shared file system with Lambda as one of the clients. The function connects through VPC networking to S3 Files mount targets. The function’s role needs S3 Files permissions. The file system syncs changes with the linked S3 bucket. Multiple functions can mount the same file system and work against the same namespace.

That last point is where the agent architecture gets interesting.

Durable orchestrator
  clones repo -> /mnt/workspace/repos/app
  checkpoints "repo cloned"
  invokes parallel agent functions

Agent function A
  reads /mnt/workspace/repos/app/src
  writes /mnt/workspace/reports/security.json

Agent function B
  reads /mnt/workspace/repos/app/tests
  writes /mnt/workspace/reports/test-risk.json

Agent function C
  reads /mnt/workspace/repos/app/infra
  writes /mnt/workspace/reports/iac-findings.json

Durable orchestrator
  resumes after invokes complete
  reads reports
  writes final review

Before S3 Files, that pattern usually meant one of these compromises:

Put the repo in /tmp and make every worker clone it again.
Put the repo in EFS and build a separate file-system lifecycle around the pipeline.
Store every intermediate artifact in S3 and write custom coordination code.
Keep too much state inside the orchestrator payload until the workflow becomes fragile.

S3 Files gives you a different shape. One shared workspace. Object durability underneath. Normal file operations above it. Less ceremony.

Why This Is Especially Useful for AI Agents

AI agents are not just prompt calls. Useful agents read files, write files, run tools, compare outputs, retry failed steps, and leave artifacts behind.

That is why the old Lambda model felt awkward for agents. Lambda was great at short event handlers. It was not naturally good at “work in a project directory for 20 minutes while several specialized workers inspect different parts of the tree.”

S3 Files and Lambda durable functions attack different halves of that problem. One gives the workers a place to stand. The other remembers where the workflow left off.

Lambda durable functions and S3 Files shared workspace for multi-agent pipelines

S3 Files handles shared data. Durable functions handle workflow progress.

Lambda durable functions were announced in December 2025 for multi-step applications and AI workflows. The docs describe a checkpoint and replay model: each durable operation records progress, and resumed executions replay completed operations without running their side effects again. A durable execution can span multiple Lambda invocations and can run for up to one year, while each individual Lambda invocation still has the normal 15 minute maximum.

That gives you a clean split:

Concern	Better owner
Workflow progress	Lambda durable function
Shared files	S3 Files
Parallel work	Standard or durable Lambda workers
Final durable artifacts	S3 bucket behind the file system
External integrations	Normal AWS SDK calls inside steps

The orchestrator should not carry a repository tarball in memory. It should carry identifiers, step state, worker results, and checkpoints. The workspace should carry files.

That is a much saner boundary.

The Old Code Versus the New Code

Here is the old Lambda shape for a CSV cleanup job:

import boto3
import csv
import os

s3 = boto3.client("s3")

def handler(event, context):
    bucket = event["bucket"]
    key = event["key"]
    output_key = key.replace("incoming/", "processed/")

    input_path = "/tmp/input.csv"
    output_path = "/tmp/output.csv"

    s3.download_file(bucket, key, input_path)

    with open(input_path, newline="") as src, open(output_path, "w", newline="") as dst:
        reader = csv.DictReader(src)
        writer = csv.DictWriter(dst, fieldnames=["email", "account_id"])
        writer.writeheader()

        for row in reader:
            writer.writerow({
                "email": row["email"].strip().lower(),
                "account_id": row["account_id"].strip()
            })

    s3.upload_file(output_path, bucket, output_key)

    try:
        os.remove(input_path)
        os.remove(output_path)
    except FileNotFoundError:
        pass

    return {"output": output_key}

That code is mostly logistics. The useful part is six lines.

With a mounted S3 Files workspace, the same function can focus on file processing:

import csv
from pathlib import Path

WORKSPACE = Path("/mnt/workspace")

def handler(event, context):
    source = WORKSPACE / event["relative_input_path"]
    target = WORKSPACE / event["relative_output_path"]

    target.parent.mkdir(parents=True, exist_ok=True)

    with source.open(newline="") as src, target.open("w", newline="") as dst:
        reader = csv.DictReader(src)
        writer = csv.DictWriter(dst, fieldnames=["email", "account_id"])
        writer.writeheader()

        for row in reader:
            writer.writerow({
                "email": row["email"].strip().lower(),
                "account_id": row["account_id"].strip()
            })

    return {"output_path": str(target)}

No download_file. No upload_file. No temporary cleanup. No special code path for a library that needs a local filename.

The real win shows up when the processing step is not a small CSV function. If the function shells out to git, ripgrep, trivy, semgrep, python, ffmpeg, imagemagick, or a PDF tool, a mounted workspace is much easier than teaching every tool to speak S3.

The Setup Details You Cannot Ignore

The announcement is simple. The deployment is not just a checkbox. This is the part I would review before letting a team replace working S3 code on a Friday.

AWS Lambda’s S3 Files documentation lists several setup requirements that matter in production:

Requirement	Why it matters
S3 file system and mount targets must exist in the same account and Region as the function	This is a regional file-system attachment, not a global S3 shortcut
Lambda must run in the same VPC path as the mount target	The function connects over local networking
Security groups must allow NFS traffic on port 2049	If this is wrong, the function cannot mount the file system
Local mount path must start with `/mnt/`	You do not pick arbitrary paths like `/workspace`
Execution role needs `s3files:ClientMount`	Required to mount the file system
Execution role needs `s3files:ClientWrite` for write access	Skip it for read-only consumers
Direct reads from S3 require `s3:GetObject` and `s3:GetObjectVersion`	S3 Files can optimize reads directly from S3
Direct reads require at least 512 MB function memory	A tiny 128 MB Lambda is the wrong target for this pattern

That 512 MB note is easy to miss. AWS says S3 Files optimizes throughput by reading directly from S3, but direct reads are only supported for Lambda functions configured with 512 MB of memory or more.

For production, I would not start by mounting the whole bucket. Scope the file system to the smallest prefix the workload needs. If the pipeline works under agent-workspaces/prod/, do not expose the entire bucket just because it is convenient during testing.

Cost: Not Free, Just Less Plumbing

The “no extra cost” line needs careful wording.

AWS says the Lambda integration has no additional charge beyond standard Lambda and S3 pricing. That does not mean every S3 Files operation is free. The S3 launch blog says you pay for the portion of data stored in the S3 file system, small file reads, write operations to the file system, and S3 requests used during synchronization.

That distinction matters.

S3 Files is designed so large sequential reads can stream directly from S3, while smaller or latency-sensitive active data can live on high-performance storage behind the file system. The defaults are cost-conscious: the synchronization configuration imports metadata and stores data for files smaller than 128 KB by default, with an expiration window that removes unused data after 30 days.

For Lambda pipelines, the cost risk is usually not one big file. It is many small operations.

Workload shape	Cost risk	Better design
Millions of tiny file reads	Per-operation overhead and high-performance storage churn	Batch reads, use larger IO, tune import thresholds
Frequent directory renames	S3 has no native directories, so renames become object rewrites/deletes	Avoid renaming large directories; write final paths directly
Large one-pass object scans	S3 Files may not add much value over direct S3 reads	Keep using S3 APIs or streaming reads
Temporary agent workspaces	Active data can stick around longer than needed	Use short expiration windows like 1-7 days
Shared hot project files	Repeated low-latency reads benefit from caching	Use S3 Files and tune prefix import rules

The boring answer: model the access pattern before migrating. S3 Files reduces code and coordination overhead, but it is still a priced storage path.

Consistency and Synchronization Gotchas

S3 Files gives you a file interface. It does not erase every object-storage reality.

AWS says changes written through the file system are exported back to S3 within minutes. Changes made directly in the S3 bucket are normally visible in the file system within seconds, but can take a minute or longer.

That is fine for a lot of pipelines. It is not fine if your design quietly assumes instant bidirectional sync. If one writer is using the S3 API and another writer is using the mounted file system, slow down and draw the ownership boundary.

The best-practice rule is simple: pick a primary writer for each path.

If a Lambda worker writes /mnt/workspace/reports/security.json, do not have another process write s3://bucket/workspace/reports/security.json directly at the same time. AWS documents the conflict behavior: when a file and the corresponding S3 object are modified concurrently, S3 is treated as the source of truth and the file can be moved to lost and found.

That is a painful surprise if your agent just spent 15 minutes generating a report.

Design the workspace like this:

One writer per file path.
Worker outputs go to worker-specific paths.
The orchestrator writes merge outputs after workers finish.
Direct S3 writes land under an ingest prefix.
File-system writers land under a workspace prefix.
Final outputs are immutable or versioned.

If you need concurrent writes to the same logical result, use a real coordination primitive. A shared file path is not a distributed lock service.

Durable Functions Change the Orchestration Choice

Before durable functions, the default answer for multi-step serverless workflows was usually Step Functions. That is still a good answer, especially when you want a visual state machine, integrations across many AWS services, and explicit workflow definitions.

Durable functions are different. They keep the workflow inside the Lambda developer experience. You write code with durable operations such as steps, waits, and invokes. The SDK checkpoints progress and uses replay to resume safely.

For agent pipelines, that can feel more natural than a large state machine because agent logic often has loops. If you have ever tried to model “run tool, inspect output, maybe retry with a different prompt, then fan out again” as a giant state machine, you know where the JSON starts to fight back.

def handler(event, context):
    repo = context.step(lambda: clone_repo(event["repo_url"]), name="clone-repo")

    findings = context.parallel([
        lambda: invoke_agent("security-agent", repo["workspace"]),
        lambda: invoke_agent("test-agent", repo["workspace"]),
        lambda: invoke_agent("iac-agent", repo["workspace"]),
    ], name="parallel-analysis")

    report = context.step(lambda: merge_findings(findings), name="merge-report")
    return report

Treat that as pseudocode, not a copy-paste SDK example. The important design rule is where side effects live. Durable functions replay code after checkpoints, so non-deterministic work should be inside durable operations. Generating random names, reading the current time, writing files, or invoking workers outside a step is how replay bugs are born.

S3 Files makes this easier because the durable function does not need to serialize file state. It only needs to checkpoint the logical progress:

repository cloned
agents invoked
reports written
final review generated
cleanup completed

The files stay in the mounted workspace.

Where S3 Files Beats the S3 API

Use S3 Files when file semantics are the point.

Good candidates:

Code analysis agents that need a repository checkout.
Media processing tools that expect local input and output paths.
PDF, image, video, archive, and document pipelines.
ML inference functions that read shared reference files.
Batch jobs with many related small files.
Workflows where multiple Lambda functions collaborate on a shared artifact set.
Existing applications or CLI tools that are hard to rewrite around S3 APIs.

Weak candidates:

A function that reads one object and writes one object with simple streaming.
Event handlers that only transform JSON payloads.
Extremely latency-sensitive API requests that should avoid VPC mount dependencies.
Workloads that mutate the same file from many writers.
Large data lake scans where direct S3 or analytics engines are already the right abstraction.

The S3 API is still the cleanest interface when the unit of work is an object. S3 Files is useful when the unit of work is a workspace.

S3 Files Versus EFS, FSx, Mountpoint, and Step Functions

This launch does not make every AWS storage or orchestration service obsolete.

If you need…	Use this first	Why
Simple object reads/writes	S3 API	Lowest conceptual overhead for object-native code
Shared file paths backed by S3 data	S3 Files	Keeps S3 as the durable data hub while exposing file semantics
General serverless shared POSIX file storage	EFS	Mature Lambda file-system integration, independent of S3 sync concerns
Enterprise NAS semantics or Windows/ONTAP/OpenZFS features	FSx	Better fit for migration and protocol-specific file workloads
High-throughput file-like access from EC2/EKS to S3 without full file-system semantics	Mountpoint for Amazon S3	Great for data lake style access patterns
Visual cross-service orchestration	Step Functions	Excellent for AWS service integrations and explicit state machines
Code-first long-running Lambda workflows	Lambda durable functions	Checkpointed workflow logic in Lambda code

My default recommendation: if the workload already treats S3 objects as objects, do not migrate just because this feature exists. If the workload keeps rebuilding a working directory from S3 at the start of every invocation, test S3 Files.

A Practical Migration Plan

Do not start by rewriting code. Start by finding the places where this actually removes complexity.

Run a search like this:

rg "download_file|upload_file|get_object|put_object|/tmp|TemporaryDirectory" src/

Then classify each function:

Function type	Migrate?	Reason
Object transform with streaming body	Usually no	S3 API is already simple
Library requires file path	Yes	S3 Files removes adapter code
Worker needs shared workspace	Yes	This is the best use case
Writes many intermediate artifacts	Probably yes	The workspace model is cleaner
Writes one final object	Maybe no	Direct `putObject` may still be simpler

For a real migration, use this sequence. Do not treat this as a bulk search-and-replace from s3.download_file() to open(). The mount changes the failure modes.

Pick one function with obvious /tmp pain.
Create an S3 Files file system scoped to a narrow prefix.
Add mount targets in the Availability Zones used by the Lambda subnets.
Attach the function to the VPC path that can reach those mount targets.
Allow NFS 2049 between the Lambda security group and the mount target security group.
Grant the execution role s3files:ClientMount.
Add s3files:ClientWrite only if the function writes through the mount.
Keep s3:GetObject and s3:GetObjectVersion if you want direct-read optimization.
Set memory to at least 512 MB for direct reads.
Rewrite the file handling behind a path abstraction.
Load test with the real file count, not a tiny sample bucket.
Watch PendingExports and ExportFailures in CloudWatch.
Only then remove the old S3 copy code.

That last step matters. Do not delete the old path until you have watched sync behavior under realistic load.

Security Checklist

S3 Files adds another access path to data. Treat that seriously.

Use this checklist before production:

Control	Recommendation
Prefix scoping	Mount only the prefix the function needs
IAM	Separate read-only and read-write roles
Security groups	Allow port 2049 only between required groups
Encryption	Use SSE-S3 or AWS KMS according to your data policy
Object access	Keep bucket policies and file-system policies aligned
POSIX metadata	Understand UID/GID and permissions behavior
Monitoring	Enable CloudWatch metrics and CloudTrail management-event visibility
Conflict policy	Document which systems are allowed to write each prefix
Cleanup	Use expiration rules for temporary workspaces
Secrets	Do not let agents write credentials into shared workspaces

The agent warning is not theoretical. A coding agent with shell access can easily write logs, environment dumps, dependency caches, patch files, and generated reports. If that workspace is backed by S3, those artifacts now have a durable lifecycle.

The Limits That Will Bite People

Several S3 Files limitations are worth calling out before a team discovers them during an incident.

S3 Files does not support hard links. It does not preserve custom user-defined S3 object metadata after file-system changes. Objects in Glacier Flexible Retrieval, Glacier Deep Archive, or archive tiers of Intelligent-Tiering must be restored before access through the file system. Full S3 object keys still have the 1,024 byte limit. A single file can be as large as 48 TiB, but that does not mean every Lambda workload should push files anywhere near that size.

Directory rename is another trap. S3 does not have native directories. AWS recommends scoping file systems to the smallest prefix your workload needs because moving or renaming a directory can require writing new keys and deleting old keys for every file under that directory.

That is exactly the kind of operation that looks harmless in code:

Path("/mnt/workspace/run-123").rename("/mnt/workspace/complete/run-123")

If run-123 has ten files, fine. If it has ten million, you just created a very expensive “rename.”

Write final output paths directly. Use manifests for completion state. Do not use giant directory renames as workflow commits.

The Architecture I Would Use for Multi-Agent Lambda

For a production multi-agent code-analysis pipeline, I would design the first version like this:

s3://agent-workspaces/
  incoming/
    requests/request-123.json
  workspaces/
    request-123/
      repo/
      agent-output/
        security.json
        tests.json
        iac.json
      final/
        report.md
        summary.json
  archive/

The Lambda durable function owns orchestration:

Validate the request.
Create a workspace ID.
Clone or unpack the repository into workspaces/request-123/repo/.
Invoke specialized worker functions in parallel.
Wait for checkpointed results.
Merge worker reports.
Write final artifacts.
Emit an event or notification.
Apply cleanup or retention policy.

Each worker writes only to its own output file. No worker writes the final report. No worker mutates another worker’s result. The orchestrator is the only process that merges. Boring, yes. That is the point. Boring ownership rules are what keep shared workspaces from turning into shared trash cans.

That structure avoids most of the ugly failure modes:

A failed worker can retry without deleting another worker’s output.
The orchestrator can resume after a checkpoint and inspect files that already exist.
Final output can be made immutable.
Cleanup can target one workspace prefix.
Cost attribution can be tied to workspace IDs.

If the same pattern grows beyond Lambda, the storage layout can still survive. ECS, EKS, EC2, and Lambda can all work with S3 Files. That gives you an escape hatch if one agent eventually needs longer runtime, GPU, or a container shape that standard Lambda should not handle.

What This Does Not Solve

This feature removes plumbing. It does not remove architecture. If an outage happens, the incident channel will not care that the code looked cleaner.

It does not make concurrent writes safe. It does not make S3 an instant local disk. It does not mean you can ignore VPC networking. It does not mean every object workload should become a file workload. It does not remove the need for idempotency, retries, metrics, or cleanup.

It also does not kill Step Functions. Durable functions are attractive when you want code-first checkpointed workflows inside Lambda. Step Functions remains excellent when your workflow spans many AWS services, needs visual inspection, or benefits from Amazon States Language as an explicit contract.

The best architecture may combine both. For example, Step Functions can own the business process, while a durable Lambda owns a complex agent subroutine that uses S3 Files as its workspace.

Bottom Line

The important part of Lambda S3 Files is not that a function can read /mnt/workspace/file.txt. The important part is that serverless workflows can now share a durable, S3-backed working directory without turning every file operation into custom object-storage code.

For normal object transforms, keep using the S3 API.

For file-heavy tools, batch processors, AI agents, repository analysis, and workflows that keep rebuilding /tmp from S3, this is a real simplification.

The boring plumbing is not completely gone. You still need VPC networking, IAM, sync monitoring, cost modeling, and sane writer boundaries. But the old ritual of “download, process, upload, clean up” is no longer the only default.

That is a big Lambda change.