How to Keep Docker Images Lean (Part 2 of 3)

Posted by Ken Hayes on September 29, 2018

For this second part of the blog, let’s discuss minimizing each layer of your docker image. Each docker image is constructed by adding layers, starting with the ‘FROM’ statement. The key word here is adding, because each RUN statement may add new data on top of the base image. If you remove something from your image, you must remove it in the same layer that it was added for it to affect image size.

Here is an (overly simplistic) example:

$ cat Dockerfile-original 
FROM alpine:latest
RUN apk update
RUN apk add gcc git
RUN echo “Do Stuff”
RUN apk del gcc git
RUN rm -rf /var/cache/apk/*

$ docker build -t original -f Dockerfile-original .

$ cat Dockerfile-optimized 
FROM alpine:latest
RUN apk update && \
apk add \
  gcc \
  git && \
echo “Do Stuff” && \
apk del \
  gcc \
  git && \
rm -rf /var/cache/apk/*

$ docker build -t optimized -f Dockerfile-optimized .

The two docker files are essentially the same. Yet, the original docker file has six layers, while the optimized file has only two. Even though both containers function identically, let’s look at the image size:

~/Test/blog$ docker images
REPOSITORY          TAG IMAGE ID            CREATED SIZE
optimized           latest 9024f74f2b12     2 minutes ago 4.17MB
original            latest 8556fbfb3642     3 minutes ago 107MB
alpine              latest 3fd9065eaf02     5 months ago 4.15MB

Clearly, the optimized image wins! But, why?

Each statement in a dockerfile can only add information to the image. If I add information in one statement, and “remove” it in another, removing will NOT decrease the size of the image! (Yes, the information is gone, but the image size is unchanged.) In the optimized image, all the commands were combined into a single RUN statement.   Only when additions and removals happen within the same layer, do removals actually reduce image size.

In short, if you optimize layer creation correctly, then you can control your image size.  In our next blog post, we’ll cover another optimization technique: using multi-stage builds.