Docker Multistage Build: Build lightweight and secure images

Optimizing images is extremely important when working with Docker containers to improve performance and reduce costs. This is where Docker Build with Multistage comes into its own, allowing you to create lighter, more secure images. This technique, one of the best practices in Docker development, simplifies the building and delivery of applications in modern environments.

In this article, we’ll understand how Docker Build with Multistage can help in everyday life, reducing the size of images and promoting greater efficiency in the docker build process. We’ll understand the benefits, practical examples and how to implement it correctly.

What is Docker Multistage Build?

Docker Multistage is a feature that allows you to use multiple stages in the build process, separating build and run environments. This allows teams to build more efficient and optimized Docker images, reducing build times and improving the process’s efficiency.

This approach, introduced in Docker 17.05, has revolutionized the creation of efficient containers.

Benefits of Multistage Build

  • Significant reduction in final image size
  • Clear separation between development and production dependencies
  • Better security by not including compilation tools in the final container
  • Simplifying the docker build process
  • Layer cache optimization

How does Docker Multistage Build work?

Operation is based on multiple FROM statements within the same Dockerfile. Each FROM starts a new build stage. You can copy artifacts from one stage to the next using COPY –from.

Basic Docker Multistage Build example

Here is a simple example demonstrating its use in a Go application:

# Stage 1: Build
FROM golang:1.20 AS build
WORKDIR /app
COPY . .
RUN go build -o main .

# Stage 2: Produção
FROM alpine:3.18
WORKDIR /root/
COPY --from=build /app/main .
ENTRYPOINT ["./main"]
What’s going on here?
  1. In the first stage, we compile the code with all the dependencies.
  2. In the second stage, we use a lightweight image (Alpine) and copy only the necessary executable.

This approach eliminates tools and dependencies from the build stage, resulting in an extremely lightweight final image.

Build tradicional

In the traditional build method for creating a Docker image, standard images from the language, manufacturer, etc. are usually used. These images contain a lot of extra elements that are sometimes unnecessary, generating a very heavy final image.

To illustrate, I bring you an application that I have in this personal repository:

This is a NodeJS application that was created to meet a Docker Challenge for the KubeDev course (currently DevOps Pro training).

The normal Dockerfile without applying Multistage is available in the repository, anyway, below is its code:

FROM node:14.17.5
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]

It’s a Dockerfile that uses the official node image and performs a few more steps necessary for the build to take place. We only have 1 FROM statement for the entire file and there are no splits.

To build locally, I access the src folder and run the command below:

docker image build -f Dockerfile-normal -t fernandomj90/conversao-temperatura-nodejs-devops-mind:v1 .

Then the build is started and the final Docker image is generated:

imagem docker
imagem docker

As you can see, a 999MB image is generated, which is quite heavy to generate, so it takes a long time to process locally or in a pipeline, as well as requiring more storage space, either in the cloud or locally, depending on the choice. Having said that, we have an image that is not interesting in terms of performance and cost, which are two mega-relevant points these days, given that costs are increasingly coming to the table when we are working with the Cloud.

Implementando Docker Multistage

By implementing the best practices of the Docker multistage build, the difference in the final size of the image is enormous. Using the project I mentioned above, we get an image that weighs just 211MB, in addition to having fewer layers created during the process and greater security in general:

Docker multistage build
Docker multistage build

Depending on the project structure, base image and other approaches, it is possible to achieve images weighing less than 100MB, for example. By using Multistage it is possible to make additional optimizations and achieve an image of just 34MB with all the design and structure needed for the application to run successfully.

Structure of a Multistage Dockerfile

FROM node:current-alpine3.15 AS base
WORKDIR /app
COPY package*.json ./

FROM base AS dependencies
RUN npm set progress=false && npm config set depth 0
RUN npm install
RUN cp -R node_modules prod_node_modules

FROM base AS release
COPY --from=dependencies /app/prod_node_modules ./node_modules
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]

In this Docker build, we have a simple Node.js application that will run on an Alpine Linux image. The Dockerfile will be divided into three stages: the base stage, the dependencies stage and the launch stage. Each stage will have its own build commands and base images.

They all have something in common: the use of the FROM statement at the beginning of each stage.

Each stage performs a specific function: setting up the base environment, optimizing dependencies and building the final image, resulting in a lightweight, efficient and secure container.

Anatomy of the Multistage Dockerfile for Node.js

Basic stage

FROM node:current-alpine3.15 AS base
WORKDIR /app
COPY package*.json ./

This first stage, called “base”, lays the foundations for our image. Alpine Linux, known for its lightness, is the ideal choice for docker containers as recommended by the official Node.js documentation.

At this basic stage, the working directory is set to /app. It then copies the package*.json files to the working directory.

Layer reuse: The use of a base stage ensures efficiency when reusing common layers.

Dependency internship

FROM base AS dependencies
RUN npm set progress=false && npm config set depth 0
RUN npm install
RUN cp -R node_modules prod_node_modules

The dependencies stage is responsible for installing the application’s dependencies. In this use case, we used the base image as a base and installed the dependencies using the npm install command. In addition, we copied the node_modules directory to the prod_node_modules directory to be used in the launch stage.

Tip: Make sure that the package.json file is configured correctly with the dependencies categorized under dependencies and devDependencies. This ensures that the installation is suitable for production.

Launch stage

FROM base AS release
COPY --from=dependencies /app/prod_node_modules ./node_modules
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]

The launch step is responsible for defining the application’s launch settings. In this case, we use the base image as a base, copy the dependencies installed in the dependencies step and set the display port to 8080. Finally, we set the application’s initialization command to node server.js.

Tip: If the application includes static files (e.g. assets or builds), make sure you copy them at the final stage.

Benefits of Multistage Build in this Dockerfile

BenefitsHow is it achieved?
Size reductionOnly the necessary files and dependencies are in the final image.
SecurityDevelopment dependencies and build tools are not included.
Performance no BuildUsing the base stage takes advantage of cache layers for future builds.
Maintenance made easySeparation of responsibilities between stages.

The Dockerfile in question is an example of how to use good practices in Docker for a specific use case: a Node.js application running on a Docker image with Alpine Linux. By dividing the build process into stages, it is possible to have a more organized and easy-to-read Dockerfile, as well as reducing the size of the final Docker image and increasing the security of the application.

Best practices in the use of Multistage Build

1. Use optimized base images

Choose minimalist base images for the production stage, such as Alpine Linux. This reduces the size of the final image. There is also the option of slim images, which are a leaner version of well-known distributions, but without as “raw” a structure as Alpine.

2. Remove temporary files

Make sure to delete caches and temporary files throughout the build process to avoid unnecessary waste for our application.

3. Combine instructions whenever possible

Group commands into a single RUN statement to minimize unnecessary layers. Example:

RUN apt-get update && apt-get install -y \
    curl \
    && rm -rf /var/lib/apt/lists/*

Creating Docker images properly requires effort and attention, but the end result brings great benefits for the speed and security of your application delivery. Larger images often have a high number of security vulnerabilities, which should not be ignored in the name of agility. The reality is that well-made images require care and dedication.

Although not a foolproof solution, Docker Multistage Build has made it much easier to create optimized images, making it simpler and safer to use them in production environments.

FAQs

Q: Does docker multistage affect build performance?
A: Not significantly. The process may even be faster due to layer optimization.

Q: Can I use more than two stages?
A: Yes, you can use multiple stages as required.

Q: How do I debug multistage builds?
A: Use the docker build –target <stage> command to build specific stages.

Q: Why use Alpine instead of other base images?
A: Alpine Linux offers an extremely lightweight base image (around 5MB), ideal for multistage docker in production.

Q: What is the function of the ‘dependencies’ stage?
A: This stage isolates the process of installing dependencies, ensuring that only necessary modules are copied to the final container.

Q: How can I further reduce the size of the image?
A: Consider using Docker layer caching and removing temporary files in a single layer.

Q: Why use Multistage Build in Node.js projects?
A: It helps to create smaller, more secure images by separating production and development dependencies. This reduces the risk of exposing unnecessary information or files.

Cover image from freepik

Now that you know how to optimize your Docker images, how about setting up your Kubernetes cluster via Terraform? Check out our post on how to create an EKS cluster via Terraform using AWS Blueprint: https://devopsmind.com.br/aws-pt-br/como-criar-cluster-eks/

Compartilhe / Share
Fernando Müller Junior
Fernando Müller Junior

I am Fernando Müller, a Tech Lead SRE with 16 years of experience in IT, I currently work at Appmax, a fintech located in Brazil. Passionate about working with Cloud Native architectures and applications, Open Source tools and everything that exists in the SRE world, always looking to develop and learn constantly (Lifelong learning), working on innovative projects!

Articles: 41

Receba as notícias por email / Receive news by email

Insira seu endereço de e-mail abaixo e assine nossa newsletter / Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *