GitLab CI Cache

Bits Lovers
Written by Bits Lovers on
GitLab CI Cache

If you run the same pipeline over and over, waiting for npm install or bundle install every time, you start wondering if there’s a better way. There is. GitLab CI has a cache mechanism that lets you skip downloading dependencies on subsequent runs.

This post covers how to set up cache in your gitlab-ci.yml, how it differs from artifacts, and a few practical examples.

How cache works in GitLab CI

Cache stores files on the runner machine (or in S3 if you have distributed caching enabled). When a job runs, GitLab checks if there’s already a cached version of the files you need. If yes, it restores them instead of downloading everything from scratch.

The catch: cache is stored on the runner, not in GitLab itself. If your pipeline uses different runners each time, cache sharing becomes unpredictable. That’s why runner affinity matters, which I’ll get to below.

GitLab 16.x-17.x clarified the distinction: artifacts are for passing data between pipeline stages and persist within a pipeline run. Cache is for storing reusable files across pipeline runs and lives in GitLab’s cache backend (S3, GCS, Azure Blob, or runner-local storage).

Use runner tags to keep cache consistent

If you have multiple runners, assign tags to them and reference those tags in your gitlab-ci.yml. Otherwise your pipeline might hop between runners that don’t share cache.

job:
  tags:
    - docker
  script:
    - npm install

Without consistent runner assignment, you won’t see much benefit from caching because each runner maintains its own cache store.

Cache vs. artifacts

These two are often confused, so let’s be clear:

Artifacts pass build results between jobs in a pipeline. They’re stored in GitLab and downloadable. Artifacts persist for a configurable time (default is 30 days), and you can control which subsequent jobs can download them using the dependencies keyword.

Cache stores dependencies like npm packages, gem files, or Python modules. It’s meant to speed up jobs by avoiding repeated downloads. Cache lives on the runner, not in GitLab, and is not designed for passing data between jobs.

Both cannot be shared between different projects, even if they use the same runner.

Cache Policy: Pull, Push, or Both

GitLab 17+ introduced more granular cache policies:

# Default: pull-push (restore on start, save at end)
cache:
  key: "${CI_COMMIT_REF_SLUG}"
  paths:
    - node_modules/
  policy: pull-push

# pull-if-present: only restore if cache exists, never push
test-unit:
  cache:
    key: "deps-${CI_COMMIT_REF_SLUG}"
    policy: pull-if-present
  script:
    - pytest tests/unit/

# pull-only: only restore, never push (for shared caches)
test-integration:
  cache:
    key: "shared-deps"
    policy: pull
  script:
    - pytest tests/integration/

Use pull-if-present for jobs where you want the cache if available but don’t need to update it on every run. Use pull for jobs that should never modify the shared cache.

Sharing cache between stages

You have two options if you need files in a later stage:

  1. Use artifacts if the files are build outputs your next job needs.
  2. Use cache if the files are dependencies that don’t change between stages.

For dependencies that stay the same across the pipeline, cache is usually the simpler choice.

Practical examples

Node.js project (cache by branch)

image: node:latest
cache:
  key: $CI_COMMIT_REF_SLUG
  paths:
    - node_modules/
build:
  script:
    - npm install

$CI_COMMIT_REF_SLUG is a GitLab predefined variable that converts your branch name into a safe string for use in filenames. Each branch gets its own cache, so you won’t have conflicts between feature branches and main.

Cache with fallback keys (GitLab 16+)

cache:
  key: "deps-${CI_COMMIT_REF_SLUG}"
  paths:
    - node_modules/
  fallback_keys:
    - "deps-main"        # Falls back to main branch cache
    - "deps-default"      # Then to default key
  policy: pull-push

This is the most useful pattern for feature branches: they get their own cache, but fall back to the main branch cache if they haven’t built anything yet. This means new feature branches get the benefit of the cached dependencies immediately without downloading everything.

Cache by job name

cache:
  key: $CI_JOB_NAME-$CI_COMMIT_REF_SLUG
  paths:
    - .cache/pip/

This gives you separate cache per job and branch. Useful when multiple jobs in the same pipeline have different dependency needs.

Global cache shared across all jobs

default:
  cache:
    key: global-cache
    paths:
      - vendor/bundle/
      - .npm/

Setting cache under default: applies it to all jobs unless a job overrides it.

Cache based on dependency files

GitLab can automatically invalidate cache when specific files change:

cache:
  - key:
      files:
        - Gemfile.lock
        - yarn.lock
    paths:
      - vendor/ruby
      - .yarn-cache/

This generates a cache key from the hash of the listed files. When the lock files change, the cache key changes and GitLab creates a fresh cache.

Docker layers caching

docker-build:
  stage: build
  image: docker:24-dind
  services:
    - docker:24-dind
  cache:
    key: "docker-${CI_COMMIT_REF_SLUG}"
    paths:
      - /var/lib/docker/
  script:
    - docker build -t myapp:$CI_COMMIT_SHA .
    - docker push myapp:$CI_COMMIT_SHA

Note: /var/lib/docker/ caching requires privileged runners and significant disk space. Consider Kaniko or Buildah instead of Docker-in-Docker for production builds.

Go modules caching

go-build:
  image: golang:1.22
  cache:
    key: "go-${CI_COMMIT_REF_SLUG}"
    paths:
      - /go/pkg/mod/
  before_script:
    - go mod download
  script:
    - go build -o myapp ./cmd/myapp
  variables:
    GOPATH: $CI_PROJECT_DIR/.go

Distributed caching with S3 (GitLab Premium)

For teams with multiple runners or GitLab.com, configure S3-backed distributed cache:

cache:
  key: "${CI_COMMIT_REF_SLUG}"
  paths:
    - .m2/repository/
    - .npm/
  backend: s3
  s3:
    Bucket: gitlab-cache-bucket
    Region: us-east-1
    # Use OIDC (preferred) or IAM instance role for auth
  policy: pull-push

GitLab 17.2+ added cache backend rotation — automatic rotation of S3 cache keys to prevent unbounded bucket growth. Set lifecycle rules on your S3 bucket to expire old cache objects.

Use OIDC (workload identity) instead of static access keys for the cache backend. It’s more secure and doesn’t require credential rotation:

cache:
  backend: s3
  s3:
    Bucket: gitlab-cache-bucket
    # No access_key or secret_key needed — uses OIDC from runner

Secure Files for CI/CD credentials (GitLab 17+)

GitLab 17.0 introduced Secure Files — a secure way to distribute certificates and credentials to jobs without baking them into the cache or artifacts:

# Upload files in Project > Settings > CI/CD > Secure Files
# Reference them in your job:

deploy:
  stage: deploy
  script:
    - |
      # Secure files land in /secrets/
      ls -la /secrets/
      # Use certificate from /secrets/client.crt
    - deploy-script.sh
  # Files are downloaded at job start, encrypted at rest

This replaces the common pattern of baking credentials into environment variables or storing them in the cache layer.

Runner cache and Docker images

Every pipeline runs inside a runner, which means Docker images are also pulled on the runner. If you’re pulling from AWS ECR, Docker Hub, or any private registry, the image pull time adds to your pipeline duration.

You can cache Docker images at the runner level, but that’s a runner configuration, not something you control from gitlab-ci.yml. If image pull time is a bottleneck, look into configuring your runner with a Docker cache storage backend.

What Changed Recently (2024-2026)

  • GitLab 17.0 (June 2024) introduced Secure Files for credential distribution
  • GitLab 17.2+ added cache backend rotation to prevent S3 bucket growth
  • Instance-level cache (Premium) allows sharing cache across projects, reducing duplicate downloads for common dependencies
  • Gemnasium and Dependabot integration now uses the cache layer to speed up dependency resolution
  • Improved cache key syntax with FINGERPRINT keyword for more predictable keys
  • CI/CD template caching at the runner level (GitLab 17+) means official templates load faster

Common Gotchas (2024-2026)

Cache is not guaranteed. GitLab cache has no SLA — it can be evicted at any time. Never store critical data only in cache. Use artifacts for data that must persist.

Cache key collisions. If two branches have the same key, they share the same cache. Use ${CI_COMMIT_REF_SLUG} as a suffix to prevent conflicts between branches.

Cache corruption. If an upload is interrupted, subsequent pipelines may use a stale or broken cache. Use checksums to validate integrity, or fall back to pull-if-present.

DIND + cache requires privileged runners. The /var/lib/docker/ path requires --privileged on the runner. Kaniko and Buildah are safer alternatives for unprivileged environments.

Unbounded S3 bucket growth. Without lifecycle policies, the cache bucket grows indefinitely. Set a bucket rule to expire objects after 7-30 days.

restore-keys vs key. restore-keys allows prefix-based partial matches. deps- matches any key starting with deps-. This is very useful for npm packages with semver ranges where exact key matching is too strict.

Wrapping up

Cache is one of those things that’s easy to set up wrong and hard to tell if it’s working. Run your pipeline twice on the same branch and watch the time difference on your dependency installation step. If it’s not faster on the second run, check your runner tags and make sure your jobs are hitting the same runner.

The key points: use key to isolate cache per branch or job, use paths to list what to cache, use fallback_keys to get shared cache on new branches, and make sure your runners are consistent. For passing build outputs between stages, use artifacts instead.

For more on GitLab CI, the posts on GitLab CI best practices and CI/CD optimization cover the surrounding pipeline design. For building Docker images in GitLab CI, the guide to building Docker images and pushing to ECR covers the full container workflow alongside caching strategies.

Bits Lovers

Bits Lovers

Professional writer and blogger. Focus on Cloud Computing.

Comments

comments powered by Disqus