Creating a Docker Workflow with Node.js

02-28-2016 1946 words 10 minutes

Contents

Note: These are my findings after working with Docker, Jenkins, and AWS for only 2-3 months. This post details my thought process for the workflow I have set up with Docker and was written to document my progress. However, it might be useful for other beginners who are interested in setting up a better workflow for development with Node.js and Docker.

The Problem

Our development team occasionally has to switch between different projects. Each project has its own set of dependencies requiring a specific version of Node.js, npm, and/or Ruby to be run. Using nvm and rvm can mitigate the issue, but constantly switching between versions is a hassle and it is easy to lose track of which version you’re currently using. Time is also wasted on debugging environment inconsistencies between local development machines, and even more time is lost solving cross-platform issues when deploying to the dev/qa/prod servers (OSX to Linux). Many hours were lost for both the development and system administration teams debugging these issues; hours that could instead be spent improving the current project or working on other projects.

Proposed Solution

We needed a standardized environment, reproducible across our developer’s machines, Jenkins servers, and production servers. The two most popular technologies that solve this problem are Vagrant and Docker. Vagrant and Docker also helps us onboard new developers much more quickly. Before we started using Docker, new developers would have to follow a lengthy readme, download every necessary dependency, and configure their installations. Despite following the readme exactly, there may be some issues due to setups from previous projects and additional time is spent troubleshooting. With Vagrant and Docker, the environment is already preconfigured and isolated, allowing a new developer to get started with much less hassle.

I chose to use Docker for our workflow primarily because of how lightweight it is. Running an entire virtual machine uses more system resources than running containers. Also, our front end projects all require Node.js, npm, and compass. Creating an image and using it as a base for all projects makes more sense than using Vagrant to run a completely isolated virtual machine for each one. Switching between projects is much faster and having a virtual machine for each project when they have very similar environments seems redundant. Furthermore, our Jenkins servers are running on small AWS EC2 instances. The overhead of multiple virtual machines on a machine is much more than having containers spun up from Docker images created from the same base image.

Setting up Docker

Since our company is in the beginning stages of embracing the DevOps philosophy, I’ve made the decision to keep our setup simple for now. As we feel more comfortable with using Docker, I’ll be setting up private repos on Docker Hub, Quay, or AWS ECR. At the moment, I have an image on a public repo serving as the project base. This image contains everything the application needs to run and compile assets with gulp and compass.

Base Dockerfile:

FROM debian:8.3
MAINTAINER Brian Choy <bchoy@barbariangroup.com>
ENV NODE_VERSION=4.3.1 \
    NPM_VERSION=3.7.3

RUN ln -snf /bin/bash /bin/sh; \
    apt-get update && apt-get install -y --no-install-recommends \
      curl \
      git \
      ca-certificates \
      libpng12-dev \
      pngquant \
      ruby-compass; \
    curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | sh; \
    cp -f ~/.nvm/nvm.sh ~/.nvm/nvm-tmp.sh; \
    echo "nvm install ${NODE_VERSION}; nvm alias default ${NODE_VERSION}; ln -s ~/.nvm/versions/node/v${NODE_VERSION}/bin/node /usr/bin/node; ln -s ~/.nvm/versions/node/v${NODE_VERSION}/bin/npm /usr/bin/npm" >> ~/.nvm/nvm-tmp.sh; \
    sh ~/.nvm/nvm-tmp.sh; \
    rm ~/.nvm/nvm-tmp.sh; \
    npm config set registry https://registry.npmjs.org/; \
    npm install -g npm@${NPM_VERSION}; \
    npm set progress=false

CMD ["/bin/bash"]

I am using the Debian 8.3 base image because it’s small enough, comes with a necessary packages out of the box, and has the correct version of ruby-compass that I need. The scope of our current work allows me to get away with the extra bloat. We also have just started using Docker. Optimizing and tweaking the base Docker image can be done later in case we run into issues and decide to favour Vagrant or another technology. Having additional bloat is well worth the time I would have spent building our project base image from a smaller base and finding out exactly what dependencies are needed.

Bootstrapping and Dealing with Node Modules

I have written an init.sh script to quickly create the containers needed and install the Node modules.

#!/bin/bash

echo "Creating node_modules container..."
docker create -v /tmp/app --name node-modules thebarbariangroup/node-compass /bin/true
echo "Installing node_modules to container..."
docker run --rm --volumes-from node-modules -v $PWD/package.json:/tmp/app/package.json:ro thebarbariangroup/node-compass /bin/bash -c "cd /tmp/app; npm install"
echo "Done!"

The script creates a node-modules container specifically containing the node_modules folder. Node modules will be installed to /tmp/app and will be mounted onto the gulp container. A few months ago when I first started working with Docker, I’ve seen many tutorials suggest installing Node modules with the Dockerfile.

COPY package.json package.json
RUN npm install
COPY . .
CMD ["gulp"]

The idea behind the above approach is to cache the Node modules by only installing them when package.json is changed. This makes sense, however if I want to add or remove modules, every single module will have to be reinstalled. Waiting a few minutes whenever a package needs to be installed disrupts the workflow for every developer and a lot of time is lost. By setting up a separate container, I avoid the need to cache Node modules. package.json is copied to /tmp/app as read-only and $ npm install is run with the new package.json on that container. The only change in workflow is remembering to run the init script instead of $ npm install. I was unable to overwrite its default function and use my script. More info on the solution I used can be found reading this GitHub issue.

In addition to this, npm’s progress bar was adding an incredible amount of build time. This issue can be viewed here and is resolved by turning the progress bar off: $ npm set progress=false. The slowdown has been addressed and fixed in npm v3.7.0.

Tackling File Watching

Gulp’s watch task is essential to our workflow. Unfortunately, due to a limitation in VirtualBox, inotify-based file watchers do not work. Polling can be enabled, but it is slow and we’re used to seeing our changes almost instantly. The best solution I’ve found for this is using rsync to send files to Docker-Machine. The docker-osx-dev project packages the rsync setup in a nice, easy to use script easily installable with brew.

Once installed, any developer working on the project will have to run the docker-osx-dev script and file watching will be enabled. One nice feature is directories and files listed in the .dockerignore file are automatically not included. I was facing issues with my changes not being seen on the Docker container. Simply adding the generated static assets to the .dockerignore file fixed my problems.

The Problem with docker-osx-dev

docker-osx-dev is great when all your developers are on OSX. Very recently one of our projects required us to support development on Windows, and the docker-osx-dev script was no longer a valid solution. This is where Vagrant came into play for me. I used Vagrant to provision the environment and set up shared folders using rsync for both Windows and OSX (Linux untested). Unfortunately, setup for Windows still has additional steps because rsync is not installed by default. Cygwin, MinGW or cwRsync has to be installed and the latest Vagrant (1.8.1) has a bug where paths are not read correctly. Using the following solutions from these two GitHub issues fixed my rsync issues and allowed me to work on my Windows environment using cwRsync. https://github.com/mitchellh/vagrant/issues/6702#issuecomment-166503021 https://github.com/mitchellh/vagrant/issues/3230#issuecomment-37757086

Running the Project

Since I’m using an already compiled image, my actual Dockerfile in this project is very short.

FROM thebarbariangroup/node-compass
MAINTAINER Brian Choy <bchoy@barbariangroup.com>

WORKDIR /src/app
RUN ln -s /tmp/app/node_modules node_modules

COPY . .
EXPOSE 3000
CMD ["npm", "run", "dockerdev"]

This will symlink the Node modules in the node-modules container (which is mounted as a volume) and allow the application to use the modules from that container.

To run gulp, I’ve modified $ npm start to run this shell script:

#!/bin/bash
docker kill node
docker rm node
docker rmi $(docker images -f "dangling=true" -q)
docker build -t thebarbariangroup/projectname .
docker run -it \
        --name node \
        --volumes-from node-modules \
        -v $(pwd)/app:/src/app/app \
        -p 3000:3000 \
        thebarbariangroup/projectname

Note: The names I’m using for my containers (node-modules and node) are placeholders. More specific names should be used to avoid confusion with multiple projects.

When the project is built, the previous lingering container will be killed if running, and removed. Dangling images (blank image names) are also removed to keep your image list clean and free up disk space. Docker by default does not have a cleanup feature.

The new node image will now build and a container will be run in interactive mode, allowing you to see gulp’s output. The node-modules container is mounted as a volume, and my app folder (where my html, js, sass is contained) is mounted onto the new node container, enabling me to view my changes while developing. Port 3000 on the host is mapped to port 3000 on the container and $ npm dockerdev (a custom npm script to run gulp) is run. For some reason I’m unable to run gulp directly due to a gulp not found error, despite it being installed. I’m unsure as to why this happens.

Your project is now visible on your Docker-Machine’s IP on port 3000. To see your Docker-Machine IP, run $ docker-machine [vm name] ip. In my hosts file, I pointed projectname to my Docker-Machine IP so I can visit http://projectname:3000 to view my app.

Using Jenkins

Jenkins is the final step in our workflow. After a change is pushed to GitHub, Jenkins will build the container using the $ gulp build:production command and the static assets will be built in the container. To retrieve these assets, the files need to be copied from the container over to a directory on the Jenkin server’s filesystem. $ docker cp projectname:/file/path/within/container /host/path/target

Jenkins will then take those compiled assets and upload them to an EC2 instance running Apache. Apache will then serve the newly compiled assets.

Note: I set up Docker manually in Jenkins when I started this project, but I’m looking forward to trying out the Docker Build Step plugin on my next project.

Future Changes

Revising the base image and updating all projects is easy. The Dockerfile for my base image shown earlier installs a specific Node.js and npm version. In hindsight, I realized that those versions should be specified in each project’s individual Dockerfile instead of in the base. That way each project can start out with an installation of nvm, but install its own version. I did not notice my error until after I introduced the Docker workflow to all of the developers. Fortunately, updating the base image was not a problem. This is how I handled my mistake.

After removing the Node.js and npm installations from the Dockerfile for my base Docker image, I pushed the image up with the 1.0.0 tag. I reference this version number in my project’s Dockerfile:

FROM thebarbariangroup/node-compass:1.0.0
MAINTAINER Brian Choy <bchoy@barbariangroup.com>

Going forward, I will be tagging my base builds with a version number. Versioning the base image allows for more versatile updates. Now all developers will get the latest changes with a git pull because the Dockerfile is checked into the GitHub repo. Docker will handle the pulling of the latest image. Furthermore, different projects can reference different versions of the base image. If I chose to update my base image with more packages, future projects can point to thebarbariangroup/node-compass:2.0.0 while older projects that do not need those packages can still reference 1.0.0.