How to Keep Docker Images Lean (Part 2 of 3)
For this second part of the blog, let’s discuss minimizing each layer of your docker image. Each docker image is constructed by adding layers, starting with the ‘FROM’ statement. The key word here is adding, because each RUN statement may add new data on top of the base image. If you remove something from your image, you must remove it in the same layer that it was added for it to affect image size.
Here is an (overly simplistic) example:
$ cat Dockerfile-original FROM alpine:latest RUN apk update RUN apk add gcc git RUN echo “Do Stuff” RUN apk del gcc git RUN rm -rf /var/cache/apk/* $ docker build -t original -f Dockerfile-original . $ cat Dockerfile-optimized FROM alpine:latest RUN apk update && \ apk add \ gcc \ git && \ echo “Do Stuff” && \ apk del \ gcc \ git && \ rm -rf /var/cache/apk/* $ docker build -t optimized -f Dockerfile-optimized .
The two docker files are essentially the same. Yet, the original docker file has six layers, while the optimized file has only two. Even though both containers function identically, let’s look at the image size:
~/Test/blog$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE optimized latest 9024f74f2b12 2 minutes ago 4.17MB original latest 8556fbfb3642 3 minutes ago 107MB alpine latest 3fd9065eaf02 5 months ago 4.15MB
Clearly, the optimized image wins! But, why?
Each statement in a dockerfile can only add information to the image. If I add information in one statement, and “remove” it in another, removing will NOT decrease the size of the image! (Yes, the information is gone, but the image size is unchanged.) In the optimized image, all the commands were combined into a single RUN statement. Only when additions and removals happen within the same layer, do removals actually reduce image size.
In short, if you optimize layer creation correctly, then you can control your image size. In our next blog post, we’ll cover another optimization technique: using multi-stage builds.