GitLab CI Cache
If you run the same pipeline over and over, waiting for npm install or bundle install every time, you start wondering if there’s a better way. There is. GitLab CI has a cache mechanism that lets you skip downloading dependencies on subsequent runs.
This post covers how to set up cache in your gitlab-ci.yml, how it differs from artifacts, and a few practical examples.
How cache works in GitLab CI
Cache stores files on the runner machine (or in S3 if you have distributed caching enabled). When a job runs, GitLab checks if there’s already a cached version of the files you need. If yes, it restores them instead of downloading everything from scratch.
The catch: cache is stored on the runner, not in GitLab itself. If your pipeline uses different runners each time, cache sharing becomes unpredictable. That’s why runner affinity matters, which I’ll get to below.
GitLab 16.x-17.x clarified the distinction: artifacts are for passing data between pipeline stages and persist within a pipeline run. Cache is for storing reusable files across pipeline runs and lives in GitLab’s cache backend (S3, GCS, Azure Blob, or runner-local storage).
Use runner tags to keep cache consistent
If you have multiple runners, assign tags to them and reference those tags in your gitlab-ci.yml. Otherwise your pipeline might hop between runners that don’t share cache.
job:
tags:
- docker
script:
- npm install
Without consistent runner assignment, you won’t see much benefit from caching because each runner maintains its own cache store.
Cache vs. artifacts
These two are often confused, so let’s be clear:
Artifacts pass build results between jobs in a pipeline. They’re stored in GitLab and downloadable. Artifacts persist for a configurable time (default is 30 days), and you can control which subsequent jobs can download them using the dependencies keyword.
Cache stores dependencies like npm packages, gem files, or Python modules. It’s meant to speed up jobs by avoiding repeated downloads. Cache lives on the runner, not in GitLab, and is not designed for passing data between jobs.
Both cannot be shared between different projects, even if they use the same runner.
Cache Policy: Pull, Push, or Both
GitLab 17+ introduced more granular cache policies:
# Default: pull-push (restore on start, save at end)
cache:
key: "${CI_COMMIT_REF_SLUG}"
paths:
- node_modules/
policy: pull-push
# pull-if-present: only restore if cache exists, never push
test-unit:
cache:
key: "deps-${CI_COMMIT_REF_SLUG}"
policy: pull-if-present
script:
- pytest tests/unit/
# pull-only: only restore, never push (for shared caches)
test-integration:
cache:
key: "shared-deps"
policy: pull
script:
- pytest tests/integration/
Use pull-if-present for jobs where you want the cache if available but don’t need to update it on every run. Use pull for jobs that should never modify the shared cache.
Sharing cache between stages
You have two options if you need files in a later stage:
- Use artifacts if the files are build outputs your next job needs.
- Use cache if the files are dependencies that don’t change between stages.
For dependencies that stay the same across the pipeline, cache is usually the simpler choice.
Practical examples
Node.js project (cache by branch)
image: node:latest
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- node_modules/
build:
script:
- npm install
$CI_COMMIT_REF_SLUG is a GitLab predefined variable that converts your branch name into a safe string for use in filenames. Each branch gets its own cache, so you won’t have conflicts between feature branches and main.
Cache with fallback keys (GitLab 16+)
cache:
key: "deps-${CI_COMMIT_REF_SLUG}"
paths:
- node_modules/
fallback_keys:
- "deps-main" # Falls back to main branch cache
- "deps-default" # Then to default key
policy: pull-push
This is the most useful pattern for feature branches: they get their own cache, but fall back to the main branch cache if they haven’t built anything yet. This means new feature branches get the benefit of the cached dependencies immediately without downloading everything.
Cache by job name
cache:
key: $CI_JOB_NAME-$CI_COMMIT_REF_SLUG
paths:
- .cache/pip/
This gives you separate cache per job and branch. Useful when multiple jobs in the same pipeline have different dependency needs.
Global cache shared across all jobs
default:
cache:
key: global-cache
paths:
- vendor/bundle/
- .npm/
Setting cache under default: applies it to all jobs unless a job overrides it.
Cache based on dependency files
GitLab can automatically invalidate cache when specific files change:
cache:
- key:
files:
- Gemfile.lock
- yarn.lock
paths:
- vendor/ruby
- .yarn-cache/
This generates a cache key from the hash of the listed files. When the lock files change, the cache key changes and GitLab creates a fresh cache.
Docker layers caching
docker-build:
stage: build
image: docker:24-dind
services:
- docker:24-dind
cache:
key: "docker-${CI_COMMIT_REF_SLUG}"
paths:
- /var/lib/docker/
script:
- docker build -t myapp:$CI_COMMIT_SHA .
- docker push myapp:$CI_COMMIT_SHA
Note: /var/lib/docker/ caching requires privileged runners and significant disk space. Consider Kaniko or Buildah instead of Docker-in-Docker for production builds.
Go modules caching
go-build:
image: golang:1.22
cache:
key: "go-${CI_COMMIT_REF_SLUG}"
paths:
- /go/pkg/mod/
before_script:
- go mod download
script:
- go build -o myapp ./cmd/myapp
variables:
GOPATH: $CI_PROJECT_DIR/.go
Distributed caching with S3 (GitLab Premium)
For teams with multiple runners or GitLab.com, configure S3-backed distributed cache:
cache:
key: "${CI_COMMIT_REF_SLUG}"
paths:
- .m2/repository/
- .npm/
backend: s3
s3:
Bucket: gitlab-cache-bucket
Region: us-east-1
# Use OIDC (preferred) or IAM instance role for auth
policy: pull-push
GitLab 17.2+ added cache backend rotation — automatic rotation of S3 cache keys to prevent unbounded bucket growth. Set lifecycle rules on your S3 bucket to expire old cache objects.
Use OIDC (workload identity) instead of static access keys for the cache backend. It’s more secure and doesn’t require credential rotation:
cache:
backend: s3
s3:
Bucket: gitlab-cache-bucket
# No access_key or secret_key needed — uses OIDC from runner
Secure Files for CI/CD credentials (GitLab 17+)
GitLab 17.0 introduced Secure Files — a secure way to distribute certificates and credentials to jobs without baking them into the cache or artifacts:
# Upload files in Project > Settings > CI/CD > Secure Files
# Reference them in your job:
deploy:
stage: deploy
script:
- |
# Secure files land in /secrets/
ls -la /secrets/
# Use certificate from /secrets/client.crt
- deploy-script.sh
# Files are downloaded at job start, encrypted at rest
This replaces the common pattern of baking credentials into environment variables or storing them in the cache layer.
Runner cache and Docker images
Every pipeline runs inside a runner, which means Docker images are also pulled on the runner. If you’re pulling from AWS ECR, Docker Hub, or any private registry, the image pull time adds to your pipeline duration.
You can cache Docker images at the runner level, but that’s a runner configuration, not something you control from gitlab-ci.yml. If image pull time is a bottleneck, look into configuring your runner with a Docker cache storage backend.
What Changed Recently (2024-2026)
- GitLab 17.0 (June 2024) introduced Secure Files for credential distribution
- GitLab 17.2+ added cache backend rotation to prevent S3 bucket growth
- Instance-level cache (Premium) allows sharing cache across projects, reducing duplicate downloads for common dependencies
- Gemnasium and Dependabot integration now uses the cache layer to speed up dependency resolution
- Improved cache key syntax with
FINGERPRINTkeyword for more predictable keys - CI/CD template caching at the runner level (GitLab 17+) means official templates load faster
Common Gotchas (2024-2026)
Cache is not guaranteed. GitLab cache has no SLA — it can be evicted at any time. Never store critical data only in cache. Use artifacts for data that must persist.
Cache key collisions. If two branches have the same key, they share the same cache. Use ${CI_COMMIT_REF_SLUG} as a suffix to prevent conflicts between branches.
Cache corruption. If an upload is interrupted, subsequent pipelines may use a stale or broken cache. Use checksums to validate integrity, or fall back to pull-if-present.
DIND + cache requires privileged runners. The /var/lib/docker/ path requires --privileged on the runner. Kaniko and Buildah are safer alternatives for unprivileged environments.
Unbounded S3 bucket growth. Without lifecycle policies, the cache bucket grows indefinitely. Set a bucket rule to expire objects after 7-30 days.
restore-keys vs key. restore-keys allows prefix-based partial matches. deps- matches any key starting with deps-. This is very useful for npm packages with semver ranges where exact key matching is too strict.
Wrapping up
Cache is one of those things that’s easy to set up wrong and hard to tell if it’s working. Run your pipeline twice on the same branch and watch the time difference on your dependency installation step. If it’s not faster on the second run, check your runner tags and make sure your jobs are hitting the same runner.
The key points: use key to isolate cache per branch or job, use paths to list what to cache, use fallback_keys to get shared cache on new branches, and make sure your runners are consistent. For passing build outputs between stages, use artifacts instead.
For more on GitLab CI, the posts on GitLab CI best practices and CI/CD optimization cover the surrounding pipeline design. For building Docker images in GitLab CI, the guide to building Docker images and pushing to ECR covers the full container workflow alongside caching strategies.
Comments