Our customer wanted to drastically cut down time to market for the new version of their application. Large quarterly releases should be replaced by small changes that can be rolled out to production multiple times a day. Below we will explain how to use Docker and Ansible to support this strategy, or, in our customer’s words, how to ‘develop software at the speed of thought’.
To facilitate development at the speed of thought we needed the following:
- A platform to deploy Docker images to
- Set up logging, monitoring and alerting
- Application versioning
- Zero downtime deployment
We’ll discuss each property below.
Our Docker images run on an Ubuntu host because we needed a supported Linux version that is well known. In our case we install the OS using an image and run all other software in a container. Each Docker container hosts exactly one process so it is easy to see what a container is supposed to do. Examples of containers include:
- Java VMs to run our Scala services
- HA Proxy
- A utility to rotate log files
- And even an Oracle database (not on acceptance and production because we expected support issues with that setup, but for development it works fine)
Most of the software running in containers is started with a bash script, but recently we started experimenting with Go so a container may need no more than a single executable.
Logging, monitoring and alerting
To save time we decided to offload the development effort of monitoring and alerting to hosted services where possible. This resulted in contracts with Loggly to store application log files, Librato to collect system metrics and OpsGenie to alert Ops based on rules defined in Loggly. Log files are shipped to Loggly using their Syslog-NG plugin. Our application was already relying on statsd so to avoid having to rewrite code, we decided to create a statsd emulator to push metrics to Librato. This may change in the future if we find the time, but for now it works fine. We’re using the Docker stats API to collect information at the container level.
In the Java world the deliverable would be a jar file published to a repository like Artifactory or Nexus. This is still possibile when working with Docker but it makes more sense to use Docker images as deliverables. The images contain everything needed to run the service, including the jar file. Like jar files, Docker images are published, in this case to the Docker registry. We started with Docker hub online but we wanted faster delivery and more control over who can access the images so we introduced our private Docker registry on premise. This works great and we are pushing around 30 to 50 images a day.
The version tag we use for a container is the date and time it was built. When the build starts we tag the sources in Git with a name based on the date and time, e.g. 20150619_1504. Components that pass their test are assembled in a release based on a text file, a composition, that lists all components that should be part of a release. The composition is tagged with a c_ prefix and a date/time stamp and is deployed to the integration test environment. Then a new test is run to find out whether the assembly still works. If so, the composition is labeled with a new rc tag, rc_20150619_1504 in our example. Releases that pass the integration test are deployed to acceptance and eventually production, but not automatically. We decided to make deployment a management decision, executed by a Jenkins job.
This strategy allows us to recreate a version of the software from source, by checking out tags that make up a release and building again, or from the Docker repository by deploying all versions of components as they are listed in the composition file.
Third-party components are tagged using the version number of the supplier.
Zero downtime deployment
To achieve high availability, we chose Ansible to deploy a Docker container based on the composition file mentioned above. Ansible connects to a host and then uses the Docker command to do the following:
- Check if the running container version differs from the one we want to deploy
- If the version is different, stop the old container and start the new one
- If the version is the same, don’t do anything
This saves a lot of time because Ansible will only change containers that need to be changed and leave all others alone.
Using Ansible we can also implement Zero Downtime Deployment:
- First shut down the health container on one node
- This causes the load balancer to remove the node from the list of active nodes
- Update the first node
- Restart the health container
- Run the update script in parallel on all other nodes.
The problem with the Docker API is that you are either in or out with no levels in between. This means, e.g. that if you add the Docker socket to a container to look at Docker stats you also allow starting and stopping images. Or if you allow access to the Docker executable you also grant access to configuration information like passwords passed to the container at deployment time. To fix this problem we created a Docker wrapper. This wrapper forbids starting privileged containers and hides some information returned by Docker inspect.
One simple security rule is that software that is not installed or is not running, can’t be exploited. Applied to Docker images this means we removed everything we don’t need and made the image as small as possible. Teams extend the base Linux image only by adding the jar file for their application. Recently we started experimenting with Go to run utilities because a Go executable needs no extra software to run. We’re also testing smaller container images.
Finally, remember not to run as root and carefully consider what file systems to share between container and host.
In summary we found a way to package software in containers, both standard utilities and Scala components, create a tagged and versioned composition that is tested and moves from one environment to the next as a unit. Using Ansible we orchestrated deployment of new releases while keeping always at least one server running.
In the future we plan to work on reducing image size by stripping the base OS and deploying more utilities as Go containers. We will also continue work on our security wrapper and plan to investigate Consul to replace our home made service registry.
This blog was based on a talk by Armin ÄŒoraliÄ‡ at XebiCon 2015. Watch Armin’s presentation here.