Hello BitsLovers! Today will talk about Gitlab CI Cache. It’s crucial to know how to work with Gitlab Cache to improve performance and save us a lot of time. For example, we may want to use caches to not need downloading content, like dependencies or libraries, each time we run a job —for example, Java libraries, node.js modules, Ruby gems, PHP packages, and Python modules.
First, I would like to clarify the term cache in this article. Because when we talk about cache on GitLab, we may talk about different areas or topics.
Let me explain why:
Gitlab provides us with a keyword that we can use in the gitlab-ci.yml file called cache. We will learn this approach to cache files or directories.
It’s crucial to understand the difference between cache and Artifacts, and we will explain the difference between them to help you decide.
Also, when we talk about cache, there are mechanisms that we can improve on the GitLab Runner that will help us speed up our pipeline—for example, caching the Docker Images.
We will cover both mechanisms. So keep reading, and let’s get started.
How the cache works
One thing to keep in mind, all cache files or Docker Images are cached by GitLab Runner. So, if you have a proper configuration on your project and Runner, we will take full advantage of all benefits from the cache.
Take advantage of Runner Tags
The first main action you should take is to guarantee that your project uses the same Runner as possible if you have more than one Runner on your GitLab. And to achieve that, we can assign tags to your project and make your gitlab-ci.yml reference them.
Why?
As we mentioned before, all our cache is within the Runner, so if we change the Runner every pipeline or job, there is no way to speed up.
Gitlab CI: Cache Docker Image
We already know that all pipelines occur inside the Runner, so if you pull images from AWS ECR (see how to use it ECR on GitLab CI) or Docker Hub or even build your image, all of them are cached inside the Runner.
We cover a lot of details in our article about How to Build Docker Images on GitLab and Cache them.
GitLab CI: Cache vs. Artifacts
Artifacts
Let’s analyze the Artifacts main ideas and behavior:
- There is an option to keep or not the latest artifact, so in other words, it means that we can choose if the artifact will expire or not.
- For the same pipeline, it’s possible to use the same artifact, but only within a sequence job.
- We can adjust the expiration time for how long the GitLab will keep the artifact available to download. Check where you can change the artifact configuration<LINK>.
- Artifacts are per Job.
Ok, now let’s see the difference compared with GitLab CI Cache.
Cache
- It’s possible to share the cache between subsequent jobs in the same pipeline.
- Also, we can share cache between pipelines from the same project.
- The file or directory cache only works if we explicitly use the keyword cache. So, if you don’t specify it, there is no cache.
The Artifact and Cache have a common characteristic: They can’t be shared between different projects, regardless of whether they use the same Runner.
GitLab CI: Cache Between Stages
So, if you need to share files between stages or jobs, you can use either the cache or artifact keyword.
Now that we have learned the main idea about how the cache works, let’s see how to declare and use cache in the gitlab-ci.yml file.
Gitlab CI Cache Example
Let’s see one example of GitLab CI using cache for a Node application. To build a Node application, we usually execute npm, and this creates a directory node_modules where we have a lot of files depending on the project size, it takes a lot of time to download them, so it makes sense to cache it.
image: node:latest
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- node_modules/
build:
script:
- npm install
We used the keyword key referencing the variable CI_COMMIT_REF_SLUG in the example above. This variable guarantees that we use and share the same cache between jobs in the same branch.
What is GitLab Cache: Key
The keyword key it’s crucial to define where and how the cache will persist. It also works as a unique “id” for the cache in GitLab CI.
So, in other words, The jobs that use the same cache key will also get the same cache, even for different pipelines.
But what happens if we don’t specify the key?
If we don’t specify, the default value for the key is the default. So, it means that all jobs that use cache keywords but don’t specify the key will share the same cache.
Let’s see more examples of how we cache manage the cache.
How to share the Cache by Stage and Branch:
cache:
key: "$CI_JOB_STAGE-$CI_COMMIT_REF_SLUG"
Share the Cache by Job and by branch:
cache:
key: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
How to share cache in different branches and any Job
We can define a global cache on Gitlab by defining a global key that is unique for the whole GitLab CI.
cache:
key: "global-key-to-share-cache-all"
There is one approach to creating a global cache for one specific job. To achieve that, we can use the predefined variable CI_JOB_NAME.
Example:
job:
cache:
key: $CI_JOB_NAME
Conclusion
GitLab CI Cache is a crucial topic because our life is getting busier, and saving time is always a good idea. However, no one likes to wait for a long pipeline to be finished. So a good cache solution makes a lot of difference.
Take advantage of the cache keys! And review your projects to make sure that you are using them correctly.
We have learned a lot! I hope that you liked this article. Let me know your opinion, it’s crucial to me!
great articles, short form of long official doc 🙂