Docker RUN Command: When To Group Commands, When Not To?

I’ve seen two distinct methodologies of using the RUN command in a Dockerfile, which I will name v1 and v2.

v1

One command per line

  • Why did the IP and port number change in Kitematic?
  • Docker can't mount folder on windows
  • Mongodb Docker - Creating initial users and setting up initial structures
  • udp client server program communication to ibm bluemix containers
  • How do symlinks in a host volume work in Docker containers?
  • How to load db2 dump data from the one db(remotly installed) to docker based db
  • FROM ubuntu/latest
    ENV DEBIAN_FRONTEND noninteractive
    
    RUN apt-get update
    RUN apt-get -y install php5-dev
    RUN libcurl4-openssl-dev
    ...
    

    v2

    Multiple commands per line

    FROM ubuntu/latest
    ENV DEBIAN_FRONTEND noninteractive
    RUN apt-get update && \
        apt-get -y install \
            php5-dev \
            libcurl4-openssl-dev
    ...
    

    Both methodologies have their advantages, the different approach to using caching being the most obvious. What other reasons are there to use one approach over the other?

    N.B. I bow to the community’s wishes if this question be considered too vague or open to opinion; however, I post it here because I expect that there are good situations to group commands, and good situations not to – and I want to know what they are.

  • Share Same resource in multiple Container in docker
  • Docker list only stopped containers
  • From inside of a Docker container, how do I connect to the localhost of the machine?
  • What´s the sha256 code of a docker image?
  • Unable to build docker image with error “shim error: docker-runc not installed on system”
  • Windows Docker port forwarding strange behaviour
  • One Solution collect form web for “Docker RUN Command: When To Group Commands, When Not To?”

    To answer this question, you must first understand the concept of “commits”, and Docker’s caching. At the end, I’m providing a rule of thumb for you to use.

    Commits

    Here’s an example:

    # Dockerfile
    FROM ubuntu/latest
    RUN touch /commit1
    RUN touch /commit2
    

    When you run docker build ., docker does the following:

    1. It launches a container from the ubuntu/latest image.
    2. It runs the first command (touch /commit1) in the container, and creates a new image.
    3. It reuses the image created in #2 to launch a new container.
    4. It runs the second command (touch /commit2) in the second container, and creates a new image.

    What you need to understand here is that if you group commands in a single RUN statement, then they will all execute in the same container, and will correspond to a single commit.

    Conversely, if you break the commands up in individual RUN statements, they won’t run in the same container, later commands will reuse the images created by earlier commands.

    Caching

    When you run a docker build ., docker reuses the images that were created earlier. In other words, if you edited the aforementioned Dockerfile to include RUN touch /commit3 at the end, and ran a docker build ., then Docker would reuse the image created in #4.

    This matters because when you include RUN apt-get update in your Dockerfile, then it isn’t guaranteed that this will run seconds before RUN apt-get install php5.

    For all you know, the commit with RUN apt-get update could have been created a month ago. The APT cache is no longer up to date, but Docker is still reusing that commit.

    Rule of Thumb

    It’s usually easier to group everything in a single RUN command, and start breaking it up when you want to start taking advantage of caching (e.g. to speedup the build process).

    When you do that, just make sure you don’t separate commands that must run within a certain time interval of one another (e.g. an update and an upgrade).

    A good practice is to avoid side effects from your commands (i.e. to clean the APT cache after you’ve installed the packages you needed).

    Conclusion

    In your example, v2 is correct, and v1 is wrong (because it’s counterproductive to cache apt-get update).

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.