Reducing image size

Posted on Apr 04, 2019   ∣  4 min read  ∣  Docker

Reducing image size

Can’t we remove superfluous files with RUN?

What happens if we do one of the following commands?

This adds a layer which removes a bunch of files.

But the previous layers (which added the files) still exist.

Removing files with an extra layer

When downloading an image, all the layers must be downloaded.

Dockerfile instruction Layer size Image size
FROM ubuntu Size of base image Size of base image
... Sum of this layer
+ all previous ones
RUN apt-get install somepackage Size of files added
(e.g. a few MB)
Sum of this layer
+ all previous ones
... Sum of this layer
+ all previous ones
RUN apt-get remove somepackage Almost zero
(just metadata)
Same as previous one

Therefore, RUN rm does not reduce the size of the image or free up disk space.

Removing unnecessary files

Various techniques are available to obtain smaller images:

Let’s review them quickly.

Collapsing layers

You will frequently see Dockerfiles like this:

FROM ubuntu
RUN apt-get update && apt-get install xxx && ... && apt-get remove xxx && ...

Or the (more readable) variant:

FROM ubuntu
RUN apt-get update \
 && apt-get install xxx \
 && ... \
 && apt-get remove xxx \
 && ...

This RUN command gives us a single layer.

The files that are added, then removed in the same layer, do not grow the layer size.

Collapsing layers: pros and cons



Building binaries outside of the Dockerfile

This results in a Dockerfile looking like this:

FROM ubuntu
COPY xxx /usr/local/bin

Of course, this implies that the file xxx exists in the build context.

That file has to exist before you can run docker build.

For instance, it can:

See for instance the busybox official image or this older busybox image.

Building binaries outside: pros and cons



Cons, if binary is added to code repository:

Squashing the final image

The idea is to transform the final image into a single-layer image.

This can be done in (at least) two ways.

Squashing the image: pros and cons



Multi-stage builds

Multi-stage builds allow us to have multiple stages.

Each stage is a separate image, and can copy files from previous stages.

We’re going to see how they work in more detail.

Multi-stage builds

Multi-stage builds in practice

Multi-stage builds for our C program

We will change our Dockerfile to:

The resulting Dockerfile is on the next slide.

Multi-stage build Dockerfile

Here is the final Dockerfile:

FROM ubuntu AS compiler
RUN apt-get update
RUN apt-get install -y build-essential
COPY hello.c /
RUN make hello
FROM ubuntu
COPY --from=compiler /hello /hello
CMD /hello

Let’s build it, and check that it works correctly:

docker build -t hellomultistage .
docker run hellomultistage

Comparing single/multi-stage build image sizes

List our images with docker images, and check the size of:

We can achieve even smaller images if we use smaller base images.

However, if we use common base images (e.g. if we standardize on ubuntu), these common images will be pulled only once per node, so they are virtually “free.”