Parallel, isolated CI jobs with Docker-in-Docker, Fig and Jenkins

At ServeBox we've been working for the past year on a reasonably large
app, composed of the following components:

  • an API powered by Rails, MongoDB and Redis
  • a library of Web components and front-end services based on the AngularJS toolset
  • several single-page apps (HTML5 & JS) using these Web components and the API as their backend

In addition to these core components, we also had to build tools and microservices: background jobs processing, Data Management, backup, Continuous Delivery, single sign-on, and so on. This resulted in a pretty complex setup for developers and an increasingly slow test suite and provisioning process.

At the very beginning of the project, I decided the provisioning of both the CI and development environments should be 100% automated to improve developers happiness, productivity and quality. We chose Vagrant, first using VirtualBox as the virtualization layer, then moving to Vagrant LXC with performance in mind.

As the complexity of the app increased, the provisioning soon started to become a serious issue. We used Chef as the provisioner because we were already using it to configure production and staging environments. I've been a proponent of Chef for years but as soon as we started to saw evidence that devs and ops had added inline shell scripts in the Vagrantfiles and custom cookbooks used to configure standard services such as MongoDB or Redis, I had to admit something was rotten in the state of Denmark. It was time to challenge our "all in one container" approach of the development environments and to introduce a bit of segregation of responsibilities.

Enters Docker

During the last few months, I often thought Docker could be an interesting solution to simplify our deployment, operations and scaling, and to get the various environments (development, CI, delivery, data management, QA, staging and production) more consistent. Eric - one of my coworkers - drew my attention to Fig (more on that later) and we decided to give Docker a try for development environments.

For core services, we ended up with four Dockerfiles to provision containers dedicated to run a single type of process:

  • Ruby processes (Rails API, Resque and Resque Scheduler, Data Management): 13 commands in the Dockerfile. Extra bonus: gems are pre-bundled on the container.
  • Nginx to serve static assets and expose the Rails API: 9 commands in the Dockerfile.
  • MongoDB with a few customizations (text search, disk usage, etc.): 9 commands in the Dockerfile.
  • Redis: 14 commands in the Dockerfile.

If I omit the FROM and MAINTAINER commands of Docker's DSL, we've been able to meet our requirements with a total of 37 clean and readable commands. Weeks later I'm still impressed by the simplicity of the Docker's DSL compared to Chef cookbooks. The images are now built on the Jenkins server and are pushed to an internal Docker registry so that they're available to the developers.

Enters Fig

After having moved containers to Docker, the core runtime environment was composed of 8 individual containers. None of the developers in the Team was familiar with Docker and its command-line interface provides a lot of commands and options. Therefore we've searched for a way to make the containers usage straightforward.

Fig is a pretty nice piece of software which allows one to describe individual services based on Docker images in a YAML file. Fig uses Docker links to provide network connectivity between containers and an easy to use command-line interface. Here is our fig.yml file (I replaced the actual name of the app with bathypelagic):

# fig.yml
mongodb:  
  image: "registry.golgotha.io/bathypelagic/mongodb:latest"
  ports:
    - "27017:27017"
  volumes:
    - "tmp/data:/data/db"
redis:  
  image: "registry.golgotha.io/bathypelagic/redis:latest"
  ports:
    - "6379:6379"
apidev:  
  image: "registry.golgotha.io/bathypelagic/app:latest"
  environment:
    RAILS_ENV: "development"
  entrypoint: ["script/launch"]
  command: "rails s"
  volumes:
    - ".:/bathypelagic"
  ports:
    - 3000
  links:
    - "mongodb"
    - "redis"
apitest:  
  image: "registry.golgotha.io/bathypelagic/app:latest"
  environment:
    RAILS_ENV: "test"
  entrypoint: ["script/launch"]
  command: "rails s -e test --pid /tmp/server-test.pid"
  volumes:
    - ".:/bathypelagic"
  ports:
    - 3000
  links:
    - "mongodb"
    - "redis"
bash:  
  image: "registry.golgotha.io/bathypelagic/app:latest"
  entrypoint: ["bash"]
  volumes:
    - ".:/bathypelagic"
  links:
    - "mongodb"
    - "redis"
resque:  
  image: "registry.golgotha.io/bathypelagic/app:latest"
  environment:
    QUEUE: "critical,high,normal,low"
  entrypoint: ["script/launch"]
  command: "rake environment resque:work"
  volumes:
    - ".:/bathypelagic"
  links:
    - "mongodb"
scheduler:  
  image: "registry.golgotha.io/bathypelagic/app:latest"
  entrypoint: ["script/launch"]
  command: "rake resque:scheduler"
  volumes:
    - ".:/bathypelagic"
  links:
    - "redis"
nginx:  
  image: "registry.golgotha.io/bathypelagic/nginx:latest"
  ports:
    - "80:80"
    - "81:81"
  volumes:
    - "./public:/bathypelagic"
    - "./log/nginx:/var/log/nginx"
  links:
    - "apidev"
    - "apitest"

Fig has made configuration of Docker-based services, network links and persistent volumes mapping a breeze. It allows devs and ops to spin up a fully operational runtime environment with a single fig up -d command.

With the Vagrant approach of environments management, developers had to open multiple SSH sessions on the VM to launch the servers, seed the databases, compile the assets, and so on. With Docker and Fig, only the runtime part of the environment is handled by the containers. As the MongoDB, Nginx and Redis containers bind their ports to the host machine, developers can seed the databases, execute the unit tests or launch the Cucumber features directly on the host (outside the containers). According to the developers feedback, it seems to make a huge difference.

Docker-in-Docker in Jenkins

We initially created a single Jenkins job which executed our tests suites in sequence. As the surface of the app and tests increased, we hit a point where the whole thing took almost 15 minutes to complete. We also set up additional jobs using the same codebase for data management, deployment to staging, etc. We've seen the migration to Docker as an opportunity to split monolithic jobs into individual jobs and to execute all of our jobs in parallel to improve the speed and quality of the feedback.

The first problem we had to solve to launch multiple times the same containers was port conflicts. Indeed, multiple MongoDB containers would try to bind the same host port 27017, as Nginx would do with ports 80 and 81 and Redis with port 6379. We could have used a template to generate variants of our fig.yml file or have tried to play with environment variables but, as I knew LXC allows to nest containers, I wondered whether we could wrap our containers inside a parent Docker container. This way we could create an isolated environment for each Jenkins executor. Fortunately, I found this article (thanks, Jérôme) that explains how, in "privileged" mode, Docker is able to run within Docker. The idea is to launch the parent container then wrap the command passed to it inside an helper script that configures and launches a nested Docker daemon before applying the actual command. We use a slightly modified version based on Jérôme's work, allowing the execution of any command instead of launching a bash shell. Our Jenkins jobs also mounted the /var/lib/docker volume inside the parent container so that the nested daemon can use the images without having to pull them from the registry:

# Make the container (almost) omnipotent
DOCKER_OPTS="--privileged"  
# Docker directory
DOCKER_OPTS="$DOCKER_OPTS -v /var/lib/docker:/var/lib/docker"  
# App directory
DOCKER_OPTS="$DOCKER_OPTS -v `pwd`:/bathypelagic"  
# Spin up the parent container
docker run $DOCKER_OPTS registry.golgotha.io/bathypelagic/ci "cukes account"  

The cukes argument we pass to the helper script is the name of the shell script to execute inside the parent container. The second argument, account, corresponds to the Cucumber features to run in this particular job:

#!/bin/bash
set -e

# Startup the nested containers
cd /bathypelagic/api  
fig up -d

# Disclaimer: the script here is a bit simplified
# for clarity, the real version includes checks
# to ensure the services are available.

# Seed the database with test data
bundle exec rake db:test:setup

# Install NodeJS packages
cd /bathypelagic/web  
npm install

# Compile static assets
time grunt all

# Execute Integration tests
result=0  
if [ $# -eq 0 ]  
then  
  echo "No arguments supplied: executing the whole feature suite"
  bundle exec cucumber --tags ~@pending || result=$?
else  
  echo "Argument supplied: executing only features in subdirectory $1"
  bundle exec cucumber -r features --tags ~@pending features/$1 || result=$?
fi

# Cleanup
echo "Cleaning up the mess!"  
cd /bathypelagic/api  
fig stop && fig rm --force

exit $result  

This script - one of our various CI tasks - starts the nested containers using Fig, seed the test database, updates the NodeJS packages, compile the static assets using Grunt, executes the integration/Cucumber tests, then cleans things up before exiting.

All was well except for one thing: database locking. Multiple Docker daemons - one for the host and one for each executor - shared the /var/lib/docker directory. When a docker daemon starts, it creates and holds an exclusive lock on the SQLite database it uses to store links. Everything goes well if you launch a single Docker-in-Docker container because, as the daemons are booted sequentially, the first lock has been released by the parent Docker daemon when the nested daemon starts. On the opposite, we needed to start multiple daemons in parallel - one per Jenkins executor:

The database file is locked  

We could have searched for a more subtle workaround but we had tons of free disk space. On the master and each Jenkins slave, we simply copied the /var/lib/docker directory to /var/lib/docker_0, /var/lib/docker_1 and /var/lib/docker_2 - one per executor, and modified the Jenkins job so that is uses the EXECUTOR_NUMBER environment variable:

DOCKER_OPTS="--privileged"  
# Use a specific Docker directory for this executor
DOCKER_OPTS="$DOCKER_OPTS -v /var/lib/docker_${EXECUTOR_NUMBER}:/var/lib/docker"  
DOCKER_OPTS="$DOCKER_OPTS -v `pwd`:/bathypelagic"  
docker run $DOCKER_OPTS registry.golgotha.io/bathypelagic/ci "cukes account"  

Conclusion

With Docker and Fig, we've been able to provide a simpler environment to our Team, to rationalize and simplify the Continuous Integration and Delivery process, and to provide feedback from the CI to the Team four times faster. Pretty cool, isn't it ? If you're interested in the nitty-gritty details, feel free to drop me an email.