Controlling where Docker starts incremental builds (use case: git clone inside Dockerfile)
From what I understand,
docker build is smart about building images incrementally, i.e. compiling only those layers where changes occured. For instance, if the source file for a
COPY statement in
Dockerfile changed, and everything else stayed the same, Docker will only execute statements starting from that
COPY and otherwise reuse previously compiled layers.
I have a scenario where I
RUN git clone inside the Docker image at build time and would like for
docker build to start its incremental build from that statement (if any source file changed).
I guess I could enforce this by placing a
COPY dummy / just before that statement and tell Docker about changes to source files with
touch dummy. Is there a better way to do this?
2 Solutions collect form web for “Controlling where Docker starts incremental builds (use case: git clone inside Dockerfile)”
Take a look at the
ARG instruction in Dockerfiles. Specifically this section on it’s impact on build caching.
I have been able to solve this by following @JHarris’ lead. My Dockerfile now looks like this:
FROM ... ARG ... ENV ... # run lengthy installs RUN apt-get update RUN apt-get install -y ... # ... ARG HEAD RUN TMP_DIR=$(mktemp -d) && \ cd $TMP_DIR && \ git clone $GIT_REPOSITORY && \ # compile source code # install from compile cd $TMP_DIR && \ rm -fr $TMP_DIR # ...
And I start the build process with:
docker build --build-arg HEAD=$(git ls-remote $GIT_REPOSITORY refs/heads/master | \ cut -f1) .
HEAD receives a new (hash) value whenever a new push to
$GIT_REPOSITORY has occured. If that happens, it starts an “incremental” build starting from the line after
ARG HEAD. The key factoid was this sentence from the Dockerfile reference (section
ARG, subsection Impact on build caching):
If a Dockerfile defines an ARG variable whose value is different from
a previous build, then a “cache miss” occurs upon its first usage, not
its definition. In particular, all RUN instructions following an ARG
instruction use the ARG variable implicitly (as an environment
variable), thus can cause a cache miss.
This indicates that
ARG HEAD must be placed as far down in
Dockerfile as possible. Even though it is a definition, and could be placed further up by itself, all
RUN statements following it already count as uses of
HEAD. So in my example it is important to place it after the
RUN apt-gets for lengthy installs, in particular.