Optimizing Rails Docker Image Size

See how we slimmed down a bloated 1.95GB Rails Docker image to a sleek 345MB! Learn about using multi-stage builds, minimal base images, and efficient caching strategies to save on storage costs and reduce attack surface.

#

Tech

Author:

Mikołaj Bogucki

A week ago, I was tasked with optimizing our docker image size. In this post, I’ll briefly explain why we’ve decided to do that and discuss how I’ve approached this puzzle.

Why?

The main advantages of smaller Docker images are cost reduction and a smaller attack surface. The size of the image influences storage and data transfer fees, and a smaller attack surface helps reduce the chance of a data breach.

How?

Let’s read an example Dockerfile and edit it together.

Below is the Dockerfile version I’ve found in the beginning. This Dockerfile results in an image weighing ≈1.95GB uncompressed.

Base image

The file begins with FROM ruby:2.7.6. These instructions create a new stage with Ruby version 2.7.6 as a base. The base image is the first opportunity for optimization. If we look at the images available on Docker Hub, we will find multiple versions of ruby:2.7.6 images. Right now, we are using ruby:2.7.6, which weighs 325.6MB, but there are other base images with this ruby version that are smaller:

We’ll go with Slim because it’s just the default image without all the unnecessary packages.

Why not Alpine? It’ll be a good idea if we are migrating a newer project that is still in development. However, we must consider that Alpine uses musl libc instead of glibc, and we’ll have to adjust our gems. Another issue is that Alpine packages are not available indefinitely. So, if you need a specific version of Node, you might be forced to compile it yourselves after a few years when it is no longer hosted in the official package repository.

Packages

There is nothing wrong with this line; we need all those things.

Node

In the third line, we see that a Node is being installed. This tells us that there are gems that require Node to be used or JavaScripts in the project are bundled using webpack, vite, or similar technology, and possibly there is a frontend that uses node modules and is more complicated. And after taking a look, indeed, there is quite some Javascript that needs processing. Luckily, Rails has a great tool in its belt called assets precompilation - we are going to leverage it to remove NodeJS from the final image

We can find out which gems use NodeJS by checking the Gemfile.lock and looking for gems that have execjs  as a dependency.

So now we know which gems use NodeJS.

It looks like both are used only during asset compilation. We will take advantage of this later. For now, let’s assign them to a new group.

The rest

The rest of the Dockerfile looks fine, we will make a few more optimisations to it later on too.

Let’s get to work

Changing base

We can start with the most obvious optimization—changing the base image. However, the slim image does not include curl and git, which we use. So we have to install them ourselves.

The base change helped us reduce image size by ≈600MB; that’s quite a good start.

Bundle without test gems

We should bundle without gems from groups test and development.

This small change saves us another ≈80MB.

Multi-stage build

The next possibility for decreasing size is removing NodeJS from the final image. To achieve that, we’ll introduce a multi-stage build process.

https://cdn.prod.website-files.com/67a8a0a51dd658948cea21dd/67cb33e4fc4a504038e12605_info-alternate_blue.svg

About multi-stage build

By using multiple FROM statements in a single Dockerfile, you can define distinct build stages, each with its own base image and dependencies. This allows you to selectively copy only the necessary artifacts from earlier stages to the final image, significantly reducing its size

We will begin by configuring the base of our image. We will define the configuration shared by the other stages in it.

Now, let’s create the first stage - assets compilation.

And gem build stage.

And the final stage.

Those changes helped us save another 900 MB

Learning Resources

Multi-stage

Understanding layers

Docker images are built layer by layer, each representing a specific instruction in the Dockerfile. When you run a command like RUN rm -rf, it doesn’t actually delete files from the image layer. Instead, it creates a new layer with the changes made. To minimize image size, we can leverage multi-stage builds or run the rm command in the same layer as a the command creating files we want to remove.

Learning Resources

Understanding the image layers

Leveraging Github Actions cache

Now that our Docker image is small we could work on making it build faster in our CI pipeline. At Lunar we are using GitHub Actions so we’ll leverage Github Actions cache to speed up our build process. Because default Docker build driver doesn’t support caching we’ll switch to using docker-container build driver.

We’ve had to modify our  basic docker build -t name:tag . command too. So what are all those new parameters?

We configure docker to use GitHub Actions cache and save the cache under the scope based on stage (staging / production) with a timeout of 30 seconds. We want to cache all layers, which is why we set the mode=max . The ghtoken has to be set to avoid throttling GitHub Actions cache reads. If you are interested in details, please see below.

Learning Resources

GitHub Actions cache

docker buildx build

Build drivers

Is it worth it?

That depends on the project. Writing cache takes time so each cache miss will be costly.

Let’s look at our example project.

Cache hit: -20sec

Cache miss: +40sec

Because we copy Gemfile and Gemfile.lock in a separate step we will miss cache only when we modify our bundle. Let’s assume we will miss cache 1 out of 50 times

$$
EX=-20\text{sec}
\frac{1}{50}=-18.8\text{sec}
$$

So in the long run we should save time on this change. However depending on project and Dockerfile shape, results will differ - build caching won’t be always worth it.

Closing words

In this post, we explored several techniques to optimize Docker image size, including using a minimal base image, leveraging multi-stage builds, and minimizing the number of layers. By implementing these strategies, we were able to reduce the size of our example image from a hefty 1.95GB to a much more manageable 345MB. This translates to cost savings on storage and data transfer fees, especially for frequently deployed applications.

Want To Discuss
Your Next Project?

Our team is excited to work with you and create something amazing together.

let's talk

let's talk

More articles

We don’t just build software, we understand your business context, challenges, and users needs so everything we create, benefits your business.

thumbnail image for blog post
plus sign
thumbnail image for blog post
plus sign
thumbnail image for blog post
plus sign
Arrow