Building Efficient Dockerfiles - Node.js

Posted by David Weinstein

Mar 13th, 2014

TL;DR

Use the following code snippet (or a variation) after all your app dependencies but before you ADD your app code to the container… this way you don’t rebuild your modules each time you re-build your container. If your package.json file changes then your modules will be rebuilt. See this gist for a full example.

Add this to your Dockerfile, after your deps, but before your app code.gist

ADD package.json /tmp/package.json
RUN cd /tmp && npm install
RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app/

Using cached layers for modules

This article is about making efficient use of docker layers. As a side effect we’ll see how to reduce development and debugging time for Node.js applications hosted in Docker containers. As you migrate from developing everything on your development host system to Docker, there are some growing pains… mainly arround interactive modify-and-test workflows.

There was a time that whenever I’d make a slight modification to an application I’d spend time waiting for docker containers to rebuild. Usually I was waiting for the modules to be reinstalled. I spent more time waiting for the dependencies to build than actually fixing the problem. I hope this article helps others get out of that cycle…

Writing an efficient Dockerfile is part of the fun of working with Docker as a new technology. Docker forces you to think a differently. Once you get in the right mindset you’ll find yourself inventing new tricks.

One key is to understand how Docker layers work. For now, visit the documentation to see a graphic showing the various layers involved with Docker. Commands in your Dockerfile will create new layers. When possible, docker will try to use an existing cached layer if it’s possible. You should try to take advantage of layers as much as possible by organizing your commands in a specific order. We’ll get into that order in a second for dealing with node modules in your application.

First here’s an example dependency file for Node:

Example - package.json

{
  "name": "myApp",
  "description": "This is my awesome app...",
  "version": "0.0.1",
  "private": true,
  "scripts": {
    "start": "node server.js"
  },
  "dependencies": {
    "docker.io": "*",
    "redis": "*",
    "restify": "*"
  }
}

And here’s what we’re going to insert into our old Dockerfile:

Add this to your Dockerfile, after your deps, but before your app code.gist

ADD package.json /tmp/package.json
RUN cd /tmp && npm install
RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app/

This snippet should generally go after all dependencies of your application are installed, but just before you add your application’s code to the container.

A bad Dockerfile could look like this:

FROM ubuntu RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list RUN apt-get update RUN apt-get -y install python-software-properties git build-essential RUN add-apt-repository -y ppa:chris-lea/node.js RUN apt-get update RUN apt-get -y install nodejs WORKDIR /opt/app ADD . /opt/app RUN npm install EXPOSE 3001 CMD ["node", "server.js"]

This is bad because we copy the app’s working directory on line 12—which has our package.json—. to our container and then build the modules. This results in our modules being built everytime we make a change to a file in ..

Here’s a full example of a better implementation:

FROM ubuntu MAINTAINER David Weinstein <david@bitjudo.com> # install our dependencies and nodejs RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list RUN apt-get update RUN apt-get -y install python-software-properties git build-essential RUN add-apt-repository -y ppa:chris-lea/node.js RUN apt-get update RUN apt-get -y install nodejs # use changes to package.json to force Docker not to use the cache # when we change our application's nodejs dependencies: ADD package.json /tmp/package.json RUN cd /tmp && npm install RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app/ # From here we load our application's code in, therefore the previous docker # "layer" thats been cached will be used if possible WORKDIR /opt/app ADD . /opt/app EXPOSE 3000 CMD ["node", "server.js"]

The idea here is that if the package.json file changes (line 14) then Docker will re-run the npm install sequence (line 15)… otherwise Docker will use our cache and skip that part.

Here’s a log showing how building our Docker container is now using the cache for the module dependency step when building the Dockerfile shown earlier.

Uploading context 4.608 kB Uploading context Step 0 : FROM ubuntu ---> 9cd978db300e Step 1 : MAINTAINER David Weinstein <david@bitjudo.com> ---> Using cache ---> 67aeca8f12ae Step 2 : RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list ---> Using cache ---> be8f73b1204f Step 3 : RUN apt-get update ---> Using cache ---> 70395f80789a Step 4 : RUN apt-get -y install python-software-properties git build-essential ---> Using cache ---> 58821e45ea25 Step 5 : RUN add-apt-repository -y ppa:chris-lea/node.js ---> Using cache ---> 79afb0c0539a Step 6 : RUN apt-get update ---> Using cache ---> 18fc6aa866d8 Step 7 : RUN apt-get -y install nodejs ---> Using cache ---> 1f1f41f47329 Step 8 : ADD package.json /tmp/package.json ---> Using cache ---> 0331fd81b4c8 Step 9 : RUN cd /tmp && npm install ---> Using cache ---> 95ee8b27b72b Step 10 : RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app/ ---> Using cache ---> 40102f5ce4f1 Step 11 : WORKDIR /opt/app ---> Using cache ---> 6a1ad0dca915 Step 12 : ADD . /opt/app ---> 6b9bdaa0e7a2 Step 13 : EXPOSE 3000 ---> Running in 722c8f0b88e2 ---> d97a3d372bda Step 14 : CMD ["node", "server.js"] ---> Running in 3309a2dab1cc ---> a0b19d7625d3 Successfully built a0b19d7625d3

This whole example is contained in this gist so that you can repeat it exactly as I have.

Assuming you’ve built the container once before (i.e., docker build -t testProject .), and then uncommented line 7 in our example server.js the above log shows what happens when we rebuild our container, i.e., simulating a change to our app’s logic. Looking at the log, on line 32 the cache was used but on line 38 the cache was not used…

Conclusion

Now our modules are cached so we aren’t rebuilding them every time we change our apps source code! This will speed up testing and debugging nodejs apps. Also this caching technique can work for ruby gems which we’ll talk about in another post.

TL;DR

Using cached layers for modules

Conclusion

Comments