Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Optimizing images is extremely important when working with Docker containers to improve performance and reduce costs. This is where Multistage Docker Builds comes into its own, allowing you to create lighter, more secure images. This technique, one of the best practices for writing dockerfiles in Docker development, simplifies the building and delivery of applications in modern environments.
In this article, we’ll learn how Docker Build with Multistage can help in everyday life. It reduces the size of images and promotes greater efficiency in the Docker build process. We’ll also learn the benefits, practical examples, and how to implement it correctly.
Topics
Docker Multistage is a feature that allows you to use multiple stages in the build process, separating build and run environments. This allows teams to build more efficient and optimized Docker images, reducing build times and improving the process’s efficiency.
This approach, introduced in Docker 17.05, has revolutionized the creation of efficient containers.
Operation is based on multiple FROM
statements within the same Dockerfile. Each FROM
starts a new build stage. You can copy artifacts from one stage to the next using COPY --from
.
Here is a simple example demonstrating its use in a Go application and build a docker image from a custom dockerfile:
# Stage 1: Build
FROM golang:1.20 AS build
WORKDIR /app
COPY . .
RUN go build -o main .
# Stage 2: Production
FROM alpine:3.18
WORKDIR /root/
COPY --from=build /app/main .
ENTRYPOINT ["./main"]
This approach eliminates tools and dependencies from the build stage, resulting in an extremely lightweight final image.
In the traditional build method for creating a Docker image, standard images from the language, manufacturer, etc. are usually used. These images contain a lot of extra elements that are sometimes unnecessary, generating a very heavy final image.
To illustrate, I bring you an application that I have in this personal repository:
This is a NodeJS application that was created to meet a Docker Challenge for the KubeDev course.
The normal Dockerfile without applying Multistage is available in the repository, anyway, below is its code:
FROM node:14.17.5
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]
It’s a Dockerfile that uses the official node image and performs a few more steps necessary for the build to take place. We only have 1 FROM statement for the entire file and there are no splits.
Struggling to generate PDFs efficiently at scale? Discover how AWS Lambda with Docker and Playwright can transform your HTML-to-PDF workflow.
To build locally, I access the src folder and run the docker image build command below:
docker image build -f Dockerfile-normal -t fernandomj90/conversao-temperatura-nodejs-devops-mind:v1 .
Then the build is started and the final Docker image is generated:
As you can see, a 999MB image is generated, which is quite heavy to generate, so it takes a long time to process locally or in a pipeline, as well as requiring more storage space, either in the cloud or locally, depending on the choice. Having said that, we have an image that is not interesting in terms of performance and cost, which are two mega-relevant points these days, given that costs are increasingly coming to the table when we are working with the Cloud.
By implementing the best practices of the Multistage Docker Build, the difference in the final size of the image is enormous. Using the project I mentioned above, we get an image that weighs just 211MB, in addition to having fewer layers created during the process and greater security in general:
Depending on the project structure, base image and other approaches, it is possible to achieve images weighing less than 100MB, for example. By using Multistage it is possible to make additional optimizations and achieve an image of just 34MB with all the design and structure needed for the application to run successfully.
FROM node:current-alpine3.15 AS base
WORKDIR /app
COPY package*.json ./
FROM base AS dependencies
RUN npm set progress=false && npm config set depth 0
RUN npm install
RUN cp -R node_modules prod_node_modules
FROM base AS release
COPY --from=dependencies /app/prod_node_modules ./node_modules
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]
In this Multistage Docker Build, we have a simple Node.js application that will run on an Alpine Linux image. The Dockerfile will be divided into three stages: the base stage, the dependencies stage, and the launch stage. Each stage will have its build commands and base images.
They all have something in common: the use of the FROM
statement at the beginning of each stage.
Each stage performs a specific function: setting up the base environment, optimizing dependencies and building the final image, resulting in a lightweight, efficient and secure container, using best practices for writing dockerfiles.
FROM node:current-alpine3.15 AS base
WORKDIR /app
COPY package*.json ./
This first stage, called “base”, lays the foundations for our image. Alpine Linux, known for its lightness, is the ideal choice for docker containers, as the official Node.js documentation recommends.
At this basic stage, the working directory is set to /app
. It then copies the package*.json
files to the working directory.
Layer reuse: Using a base stage ensures efficiency when reusing common layers.
FROM base AS dependencies
RUN npm set progress=false && npm config set depth 0
RUN npm install
RUN cp -R node_modules prod_node_modules
The dependencies stage is responsible for installing the application’s dependencies. In this use case, we used the base image as a base and installed the dependencies using the npm install
command. In addition, we copied the node_modules
directory to the prod_node_modules
directory to be used in the launch stage.
Tip: Make sure that the
package.json
file is configured correctly with the dependencies categorized underdependencies
anddevDependencies
. This ensures that the installation is suitable for production.
FROM base AS release
COPY --from=dependencies /app/prod_node_modules ./node_modules
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]
The launch step is responsible for defining the application’s launch settings. In this case, we use the base image as a base, copy the dependencies installed in the dependencies step and set the display port to 8080
. Finally, we set the application’s initialization command to node server.js
.
Tip: If the application includes static files (e.g. assets or builds), make sure you copy them at the final stage.
Benefits | How is it achieved? |
---|---|
Size reduction | This surgical approach to image composition ensures that each container carries just the precise resources needed to function—no extraneous files, debug tools, or peripheral dependencies burden the final artifact. |
Security | Development dependencies and build tools are not included. |
Build Performance | Using the base stage takes advantage of cache layers for future builds. |
Maintenance made easy | Separation of responsibilities between stages. |
The Dockerfile in question is an example of how to use good practices in Docker for a specific use case: a Node.js application running on a Docker image with Alpine Linux. By dividing the build process into stages, it is possible to have a more organized and easy-to-read Dockerfile, as well as reducing the size of the final Docker image and increasing the security of the application.
Choose minimalist base images for the production stage, such as Alpine Linux. The result is a lean, streamlined image that not only occupies less storage space but also improves deployment efficiency. Reducing the image size translates to faster network transfers, quicker container startup times, and more economical resource utilization across development and production environments. There is also the option of slim images, which are a leaner version of well-known distributions, but without as “raw” a structure as Alpine.
Make sure to delete caches and temporary files throughout the build process to avoid unnecessary waste for our application.
Group commands into a single RUN statement to minimize unnecessary layers. Example:
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
Creating Docker images properly requires effort and attention, but the end result brings great benefits for the speed and security of your application delivery. Larger images often have a high number of security vulnerabilities, which should not be ignored in the name of agility. The reality is that well-made images require care and dedication.
Although not a foolproof solution, Multistage Docker Build has made it much easier to create optimized images, making it simpler and safer to use them in production environments.
Q: Does Docker Multistage affect build performance?
A: Not significantly. The process may even be faster due to layer optimization.
Q: Can I use more than two stages?
A: Yes, you can use multiple stages as required.
Q: How do I debug multistage builds?
A: Use the docker build --target <stage>
command to build specific stages.
Q: Why use Alpine instead of other base images?
A: Alpine Linux offers an extremely lightweight base image (around 5MB), ideal for multistage docker in production.
Q: What is the function of the ‘dependencies’ stage?
A: This stage isolates installing dependencies, ensuring that only necessary modules are copied to the final container.
Q: How can I further reduce the size of the image?
A: Consider using Docker layer caching and removing temporary files in a single layer.
Q: Why use Multistage Build in Node.js projects?
A: It helps to create smaller, more secure images by separating production and development dependencies. This reduces the risk of exposing unnecessary information or files.
Now that you know how to optimize your Docker images, how about setting up your Kubernetes cluster via Terraform? Check out our post on how to create an EKS cluster via Terraform using AWS Blueprint.