Lambda Container Images: Build on GitLab CI, Deploy to ECR
Container images solved my biggest Lambda frustration: the 250MB zip limit.
I had a Python Lambda that processed images using Pillow, NumPy, OpenCV, and a custom model checkpoint. The zip with all dependencies was 312MB compressed. Lambda wouldn’t accept it. I tried Lambda Layers — stacked three of them, ran into the 250MB unzipped ceiling on those too. I tried stripping NumPy down to just the parts I needed. It still didn’t fit.
Then Lambda added container image support and the problem disappeared. My function now ships as a 1.1GB container image. Lambda runs it fine. I get the full dependency set, a reproducible build environment, and the same deployment pipeline that handles every other service in the stack.
That was three years ago. In 2026, container images aren’t a workaround for the size limit — they’re the right choice for a meaningful category of Lambda functions, and the toolchain around them has matured considerably.
When Container Images Beat Zip Deployments
The zip format is not going away. For a Lambda function that’s 500 lines of Python with no exotic dependencies, zip is simpler, deploys faster, and cold starts are marginally quicker. Don’t switch to containers because it sounds modern.
Switch to containers when:
Dependencies don’t fit in zip + layers. The total unzipped size limit is 250MB for zip deployments, including all layers. Container images go up to 10GB. If you’re running ML inference, processing images with complex libraries, or bundling a compiled binary, you’ll hit the zip limit before the container limit.
You need a specific OS or system library. Zip deployments run on whatever Amazon Linux version Lambda uses for your runtime. Container images let you control the base image. If you need a specific version of libpq, libxml2, or a custom font renderer, containers give you that control without fighting the runtime environment.
You want reproducible builds. A zip deployment built locally might differ from one built in CI because the developer’s machine has a slightly different Python environment. The Dockerfile is the single source of truth. Everyone builds the same image.
You’re already containerizing everything else. If your ECS services, API Gateway integrations, and batch jobs all ship as Docker images, adding Lambda to that pattern means one mental model, one set of build tools, one CI pipeline structure. The consistency has real value.
You need the local testing story. Running a zip Lambda locally requires SAM CLI and some ceremony. Running a container Lambda locally is docker run. If your team already knows Docker, that’s zero learning curve.
Lambda Container Image Requirements
Lambda doesn’t run arbitrary Docker images. The image needs to satisfy a few requirements.
The runtime interface client (RIC) must be present. The RIC is what Lambda uses to communicate with your function — it handles the invocation loop, sends events to your handler, receives responses, and reports errors. For Lambda’s managed runtimes (Python, Node, Java, etc.), the RIC is baked into AWS’s base images. If you build from a non-AWS base image like python:3.12-slim, you need to install it explicitly.
The container must respond to Lambda’s Runtime API on port 8080. The RIC handles this for you, but it means your CMD or ENTRYPOINT needs to launch the RIC, not your handler directly.
The handler path format is module.function. For Python, if your file is app.py and your handler function is handler, the Lambda handler string is app.handler. The RIC receives this at startup.
The simplest starting point is one of AWS’s base images. They have the RIC pre-installed, use the correct directory layout, and are kept up to date with Lambda runtime patches. For Python 3.12:
FROM public.ecr.aws/lambda/python:3.12
COPY requirements.txt .
RUN pip install -r requirements.txt --no-cache-dir
COPY app.py .
CMD ["app.handler"]
That’s it. CMD is the handler string. The AWS base image sets ENTRYPOINT to the RIC, so your CMD becomes the argument to it — the handler path.
Dockerfile: Python Lambda with Heavy Dependencies
The real value is in functions that wouldn’t fit in a zip. Here’s a realistic example: a Lambda that takes an S3 key, downloads an image, runs it through an OpenCV pipeline to detect objects, and returns bounding boxes.
FROM public.ecr.aws/lambda/python:3.12
# System dependencies for OpenCV
RUN dnf install -y \
mesa-libGL \
libXext \
libSM \
&& dnf clean all
# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy function code
COPY src/ ${LAMBDA_TASK_ROOT}/
CMD ["handler.process_image"]
# requirements.txt
opencv-python-headless==4.9.0.80
Pillow==10.3.0
numpy==1.26.4
boto3==1.34.0
Note opencv-python-headless instead of opencv-python. The headless variant doesn’t include GUI dependencies. For Lambda, you never need the display components, and the headless version is significantly smaller.
The ${LAMBDA_TASK_ROOT} environment variable is set by the AWS base image to /var/task. That’s where Lambda looks for your function code. Copy your source there.
Multi-Stage Build: Keeping the Image Lean
A single-stage build works, but it leaves build tools in the final image. For compiled dependencies — anything that runs gcc or rustc during pip install — you can shrink the final image with a multi-stage build:
# Build stage: compile dependencies
FROM public.ecr.aws/lambda/python:3.12 AS builder
RUN dnf install -y gcc g++ make && dnf clean all
COPY requirements.txt .
RUN pip install --no-cache-dir \
--target /install \
-r requirements.txt
# Runtime stage: copy compiled packages only
FROM public.ecr.aws/lambda/python:3.12
COPY --from=builder /install ${LAMBDA_TASK_ROOT}
COPY src/ ${LAMBDA_TASK_ROOT}/
CMD ["handler.process_image"]
The builder stage has gcc and the full compilation toolchain. The runtime stage has none of it — just the compiled .so files and pure Python packages. For functions with cryptography, lxml, or psycopg2, this can cut 200-300MB from the final image.
Layer caching matters here. Put COPY requirements.txt and RUN pip install before COPY src/. That way, if you change only your application code, Docker reuses the cached dependency layer and the rebuild takes seconds instead of minutes.
ARM64 (Graviton2): 34% Cost Savings
Lambda Graviton2 functions cost less and run faster on compute-bound workloads. AWS quotes up to 34% better price-performance versus x86. The cold start and warm execution improvements on compute-heavy Python are real — I’ve measured 15-20% faster execution on image processing tasks on the same code.
Building for ARM64 requires either building on an ARM machine or using Docker’s cross-platform build:
FROM public.ecr.aws/lambda/python:3.12
# Everything else is the same — the base image
# handles the architecture difference
To build for ARM64 from an x86 machine:
docker buildx build \
--platform linux/arm64 \
--tag your-account.dkr.ecr.us-east-1.amazonaws.com/my-function:latest \
--push \
.
In GitLab CI, enable QEMU for cross-platform builds:
build_arm64:
image: docker:26
services:
- docker:26-dind
before_script:
- docker run --privileged --rm tonistiigi/binfmt --install all
- docker buildx create --use
script:
- docker buildx build --platform linux/arm64 ...
The trade-off: cross-platform builds are slower than native builds. If your CI runners are already ARM64 (GitLab offers hosted ARM runners), build natively. If they’re x86, the QEMU emulation adds build time but the deployed function runs natively on Graviton2. For most teams, the cost savings justify the slower CI build.
One caveat: some Python packages ship architecture-specific wheels. opencv-python-headless has ARM64 wheels on PyPI. Most popular packages do. If you hit a package with no ARM64 wheel, it’ll compile from source during the build — which works, just slower.
GitLab CI Pipeline: Build, Push ECR, Update Lambda
The full pipeline connects GitLab’s Docker build pattern with Lambda deployment. Here’s a complete .gitlab-ci.yml for a Lambda container function:
stages:
- build
- push
- deploy
variables:
AWS_DEFAULT_REGION: us-east-1
ECR_REGISTRY: "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com"
ECR_REPOSITORY: my-lambda-function
IMAGE_TAG: "${CI_COMMIT_SHA:0:8}"
# Build the container image
build:
stage: build
image: docker:26
services:
- docker:26-dind
before_script:
- docker info
script:
- docker build
--cache-from "${ECR_REGISTRY}/${ECR_REPOSITORY}:latest"
--tag "${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG}"
--tag "${ECR_REGISTRY}/${ECR_REPOSITORY}:latest"
.
- docker save "${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG}" | gzip > image.tar.gz
artifacts:
paths:
- image.tar.gz
expire_in: 1 hour
# Push to ECR using OIDC
push:
stage: push
image:
name: amazon/aws-cli:2.15.0
entrypoint: [""]
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://gitlab.com
before_script:
- >
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s"
$(aws sts assume-role-with-web-identity
--role-arn "${AWS_ROLE_ARN}"
--role-session-name "gitlab-ci-${CI_JOB_ID}"
--web-identity-token "${GITLAB_OIDC_TOKEN}"
--duration-seconds 3600
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]"
--output text))
- apk add --no-cache docker
- aws ecr get-login-password | docker login --username AWS --password-stdin "${ECR_REGISTRY}"
- docker load < image.tar.gz
script:
- docker push "${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG}"
- docker push "${ECR_REGISTRY}/${ECR_REPOSITORY}:latest"
# Update Lambda to use the new image
deploy:
stage: deploy
image:
name: amazon/aws-cli:2.15.0
entrypoint: [""]
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://gitlab.com
before_script:
- >
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s"
$(aws sts assume-role-with-web-identity
--role-arn "${AWS_ROLE_ARN}"
--role-session-name "gitlab-ci-${CI_JOB_ID}"
--web-identity-token "${GITLAB_OIDC_TOKEN}"
--duration-seconds 3600
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]"
--output text))
script:
- |
IMAGE_URI="${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG}"
aws lambda update-function-code \
--function-name my-lambda-function \
--image-uri "${IMAGE_URI}" \
--architectures arm64
- |
aws lambda wait function-updated \
--function-name my-lambda-function
echo "Lambda updated to ${IMAGE_TAG}"
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
environment:
name: production
The --cache-from flag in the build stage tells Docker to pull the existing latest tag and reuse its layers. On a warm build, only layers that changed (usually just COPY src/) get rebuilt. Without this, every CI build starts from scratch.
The aws lambda wait function-updated call after update-function-code is easy to forget. Lambda updates are asynchronous — the CLI returns immediately, but the function keeps serving the old image until the update propagates. The wait command polls until the update is complete. If you kick off an integration test immediately after deploy without waiting, you might be testing the previous version.
OIDC Auth: No Long-Lived AWS Credentials
The pipeline above uses OIDC federation instead of static AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables. GitLab generates a short-lived JWT token (GITLAB_OIDC_TOKEN) for each job. AWS trusts that token based on your IAM role’s trust policy.
The IAM trust policy for the role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/gitlab.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"gitlab.com:sub": "project_path:your-group/your-project:ref_type:branch:ref:main"
}
}
}
]
}
The Condition on gitlab.com:sub is the key. It restricts which GitLab projects and branches can assume this role. Only your specific project, on the main branch, gets the token that can assume the deployment role. A fork or a feature branch gets a different sub claim and cannot assume the role.
The full ECR GitLab OIDC setup has the OIDC provider registration steps and the minimum IAM permissions needed for ECR push plus Lambda update. The short version: no stored credentials, no rotation, no access key that outlives the CI job.
Cold Start Optimization
Container images have larger cold starts than zip deployments. A 20MB zip cold start might be 200-400ms. A 500MB container image cold start is commonly 1-3 seconds. That gap is real and matters for latency-sensitive functions.
Several things affect it.
Image size. Lambda caches the image after the first pull, but the initial pull on a new execution environment depends on image size. Smaller images cold-start faster. A multi-stage build that shaves 300MB off your image is worth it both for deployment speed and cold start latency. Trim what you don’t need: documentation, test files, .pyc files from packages you’re not using.
RIC startup time. The Lambda Runtime Interface Client initializes before your handler code runs. AWS’s base images optimize this. If you’re building on a non-AWS base, use the official aws-lambda-ric package from PyPI and benchmark it — the startup path is slightly different.
Initialization code placement. Lambda runs your module-level code on every cold start. SDK clients, model loads, and connection pools should be initialized outside the handler function. This is true for zip deployments too, but it matters more with container images because the perception of a slow cold start is already higher.
import boto3
import cv2
import numpy as np
# These run once per cold start, outside the handler
s3_client = boto3.client("s3")
model = cv2.CascadeClassifier("/var/task/models/haarcascade.xml")
def process_image(event, context):
# s3_client and model are already initialized
bucket = event["bucket"]
key = event["key"]
# ...
Lambda SnapStart (Java only). If you’re running Java Lambda functions as container images, SnapStart is worth enabling. It takes a snapshot of the initialized execution environment and restores from it instead of cold-starting. Cold start times drop from several seconds to under 1 second in most cases. SnapStart requires Java 11+ and is supported with container images as of 2024. Python and Node equivalents don’t exist yet — for those runtimes, keep your initialization code lean.
Provisioned concurrency. If you need consistent sub-100ms latency and can predict traffic patterns, provisioned concurrency eliminates cold starts by keeping execution environments warm. It costs money (you pay for the reserved capacity even when idle), but for latency-critical functions it’s the right answer. Container images are fully compatible with provisioned concurrency.
A rough benchmark from a Python image processing function I maintain: 450MB image, ARM64, us-east-1. Cold start on first invocation after deploy: ~2.1 seconds. Subsequent cold starts (Lambda reusing a warm execution environment but starting a new handler): ~1.4 seconds. Warm invocation: ~180ms. For a background processing function triggered by S3 events, those cold start numbers are perfectly acceptable.
Terraform: Lambda Function from ECR Image
Once the image is in ECR, Terraform provisions the Lambda function. The resource uses package_type = "Image" and references the ECR image URI:
resource "aws_ecr_repository" "lambda_function" {
name = "my-lambda-function"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
lifecycle {
prevent_destroy = true
}
}
resource "aws_lambda_function" "image_processor" {
function_name = "image-processor"
role = aws_iam_role.lambda_exec.arn
package_type = "Image"
image_uri = "${aws_ecr_repository.lambda_function.repository_url}:latest"
architectures = ["arm64"]
timeout = 60
memory_size = 1024
image_config {
command = ["handler.process_image"]
working_directory = "/var/task"
}
environment {
variables = {
BUCKET_NAME = var.bucket_name
LOG_LEVEL = "INFO"
}
}
}
The image_config block overrides the CMD instruction from the Dockerfile. This lets you deploy the same image to multiple Lambda functions with different handlers — useful if you have one Docker image that bundles several related functions.
One Terraform gotcha: image_uri with :latest means Terraform won’t detect when you push a new image tag. If you update the ECR image but don’t change the Terraform config, terraform plan shows no changes. The GitLab CI pipeline handles this correctly by calling aws lambda update-function-code directly after the push — Terraform manages the infrastructure, CI manages the deployment.
For teams that want Terraform to manage both, use the commit SHA as the image tag instead of latest:
variable "image_tag" {
description = "ECR image tag to deploy"
type = string
}
resource "aws_lambda_function" "image_processor" {
image_uri = "${aws_ecr_repository.lambda_function.repository_url}:${var.image_tag}"
# ...
}
Pass -var="image_tag=${CI_COMMIT_SHA:0:8}" from the CI pipeline. Terraform detects the tag change and updates the function.
Testing Locally with SAM CLI and Docker
Before pushing to ECR, you can run the Lambda container locally. Two approaches.
Direct docker run:
# Build the image
docker build -t my-lambda-local .
# Run it — Lambda listens on port 8080 inside the container
docker run \
-p 9000:8080 \
-e AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" \
-e AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" \
-e AWS_DEFAULT_REGION="us-east-1" \
my-lambda-local
# Invoke it in another terminal
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" \
-d '{"bucket": "my-bucket", "key": "test-image.jpg"}'
The Lambda Runtime Interface Emulator (RIE) is bundled in AWS base images. It exposes the invocation endpoint on port 8080. Your curl call simulates the Lambda service sending an event to your function.
SAM CLI:
# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
ImageProcessor:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
Architectures:
- arm64
Timeout: 60
MemorySize: 1024
Metadata:
DockerTag: latest
DockerContext: .
Dockerfile: Dockerfile
sam build
sam local invoke ImageProcessor --event events/test-event.json
SAM handles building the image and routing the event through the Lambda runtime emulator. It also supports sam local start-lambda to keep the function running and accept repeated invocations — useful for iterative testing without rebuilding the image each time.
For the Pillow and complex image processing case, local testing with docker run is how I catch the libGL dependency issue before it bites in production. Run the container locally with a real test image, look at the logs, and you’ll see the missing shared library error immediately instead of after a deploy.
Putting It Together
Container images aren’t the right Lambda packaging format for every function. Simple functions with standard dependencies are better served by zip deployments — faster cold starts, simpler pipeline, nothing to build.
But for the functions that push against the zip size limit, need a controlled OS environment, or benefit from Docker’s established toolchain, container images remove a category of operational friction that zip deployments can’t address. The 250MB ceiling was a real constraint for data-heavy Python work. That constraint is gone.
The pipeline pattern here — GitLab CI builds the image, OIDC authenticates to AWS without stored credentials, ECR stores the image, Lambda runs it — works cleanly in 2026 and requires no AWS-specific plugins or third-party integrations. It’s the same pipeline you’d use for any containerized service, extended by two CLI calls to push and update the Lambda function.
For the GitLab and ECR side of this, building and pushing Docker images from GitLab CI covers the registry authentication and image tagging patterns in more detail. If you’re managing Lambda dependencies the zip way, Lambda Layers and custom runtimes explains where layers fit and where they don’t. And if the function you’re containerizing does image processing, the Pillow on Lambda guide has the dependency setup that container images make significantly less painful.
Comments