Providing High Availability to stateless applications is pretty trivial as was shown in the previous blog posts A High Available Docker Container Platform and Rolling upgrade of Docker applications using CoreOS and Consul. But how does this work when you have a persistent service like Redis?
In this blog post we will show you how a persistent service like Redis can be moved around on machines in the cluster, whilst preserving the state. The key is to deploy a fleet mount configuration into the cluster and mount the storage in the Docker container that has persistent data.
To support persistency we have added a NAS to our platform architecture in the form of three independent NFS servers which act as our NAS storage, as shown in the picture below.
The applications are still deployed in the CoreOS cluster as docker containers. Even our Redis instance is running in a Docker container. Our application is configured using the following three Fleet unit files:
- app-hellodb@.service – the template unit file for the web application
- redis.service – the unit file of the Redis server
- mnt-data.mount – the unit file for the required mount the Redis Server.
The unit file of the Redis server is the most interesting one because it is our persistence service. In the unit section of the file, it first declares that it requires a mount for ‘/mnt/data’ on which it will persist its data.
[code]
[Unit]
Description=app-redis
Requires=mnt-data.mount
After=mnt-data.mount
RequiresMountsFor=/mnt/data
[/code]
In the start clause of the redis service, a specific subdirectory of /mnt/data is mounted into the container.
[code]
…
ExecStart=/usr/bin/docker run –rm \
–name app-redis \
-v /mnt/data/app-redis-data:/data \
-p 6379:6379 \
redis
…
[/code]
The mnt-data.mount unit file is quite simple: It defines an NFS mount with the option ‘noauto’ indicating that device should be automatically mounted on boot time. The unit file has the option ‘Global=true’ so that the mount is distributed to all the nodes in the cluster. The mount is only activated when another unit requests it.
[code]
[Mount]
What=172.17.8.200:/mnt/default/data
Where=/mnt/data
Type=nfs
Options=vers=3,sec=sys,noauto
[X-Fleet]
Global=true
[/code]
Please note that the NFS mount specifies system security (sec=sys) and uses NFS version 3 protocol, to avoid all sorts of errors surrounding mismatches in user- and group ids between the client and the server.
Preparing the application
To see the failover in action, you need to start the platform and deploy the application:
[bash]
git clone https://github.com/mvanholsteijn/coreos-container-platform-as-a-service.git
cd coreos-container-platform-as-a-service/vagrant
vagrant up
./is_platform_ready.sh
[/bash]
This will start 3 NFS servers and our 3 node CoreOS cluster. After that is done, you can deploy the application, by first submitting the mount unit file:
[bash]
export FLEETCTL_TUNNEL=127.0.0.1:2222
cd ../fleet-units/app
fleetctl load mnt-data.mount
[/bash]
starting the redis service:
[bash]
fleetctl start app-redis.service
[/bash]
and finally starting a number of instances of the application:
[bash]
fleetctl submit app-hellodb@.service
fleetctl load app-hellodb@{1..3}.service
fleetctl start app-hellodb@{1..3}.service
[/bash]
You can check that everything is running by issuing the fleetctl list-units command. It should show something like this:
[bash]
fleetctl list-units
UNIT MACHINE ACTIVE SUB
app-hellodb@1.service 8f7472a6…/172.17.8.102 active running
app-hellodb@2.service b44a7261…/172.17.8.103 active running
app-hellodb@3.service 2c19d884…/172.17.8.101 active running
app-redis.service 2c19d884…/172.17.8.101 active running
mnt-data.mount 2c19d884…/172.17.8.101 active mounted
mnt-data.mount 8f7472a6…/172.17.8.102 inactive dead
mnt-data.mount b44a7261…/172.17.8.103 inactive dead
[/bash]
As you can see three app-hellodb instances are running and the redis service is running on 172.17.8.101, which is the only host that as /mnt/data mounted. The other two machines have this mount in the status ‘dead’, which is an unfriendly name for stopped.
Now you can access the app..
[bash]
yes ‘curl hellodb.127.0.0.1.xip.io:8080; echo ‘ | head -10 | bash
..
Hello World! I have been seen 20 times.
Hello World! I have been seen 21 times.
Hello World! I have been seen 22 times.
Hello World! I have been seen 23 times.
Hello World! I have been seen 24 times.
Hello World! I have been seen 25 times.
Hello World! I have been seen 26 times.
Hello World! I have been seen 27 times.
Hello World! I have been seen 28 times.
Hello World! I have been seen 29 times.
[/bash]
Redis Fail-over in Action
To see the fail-over in action, you start a monitor on a machine not running Redis. In our case the machine running app-hellodb@1.
[bash]
vagrant ssh -c \
“yes ‘curl –max-time 2 hellodb.127.0.0.1.xip.io; sleep 1 ‘ | \
bash” \
app-hellodb@1.service
[/bash]
Now restart the redis machine:
[bash]
vagrant ssh -c “sudo shutdown -r now” app-redis.service
[/bash]
After you restarted the machine running Redis, the output should look something like this:
[bash]
…
Hello World! I have been seen 1442 times.
Hello World! I have been seen 1443 times.
Hello World! I have been seen 1444 times.
Hello World! Cannot tell you how many times I have been seen.
(Error 111 connecting to redis:6379. Connection refused.)
curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received
curl: (28) Operation timed out after 2007 milliseconds with 0 out of -1 bytes received
Hello World! I have been seen 1445 times.
Hello World! I have been seen 1446 times.
curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received
curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received
Hello World! I have been seen 1447 times.
Hello World! I have been seen 1448 times.
..
[/bash]
Notice that the distribution of your units has changed after the reboot.
[bash]
fleetctl list-units
…
UNIT MACHINE ACTIVE SUB
app-hellodb@1.service 3376bf5c…/172.17.8.103 active running
app-hellodb@2.service ff0e7fd5…/172.17.8.102 active running
app-hellodb@3.service 3376bf5c…/172.17.8.103 active running
app-redis.service ff0e7fd5…/172.17.8.102 active running
mnt-data.mount 309daa5a…/172.17.8.101 inactive dead
mnt-data.mount 3376bf5c…/172.17.8.103 inactive dead
mnt-data.mount ff0e7fd5…/172.17.8.102 active mounted
[/bash]
Conclusion
We now have the basis for a truly immutable infrastructure setup: the entire CoreOS cluster including the application can be destroyed and a completely identical environment can be resurrected within a few minutes!
- Once you have an reliable external persistent store, CoreOS can help you migrate persistent services just as easy as stateless services. We chose a NFS server for ease of use on this setup, but nothing prevents you from mounting other kinds of storage systems for your application.
- Consul excels in providing fast and dynamic service discovery for services, allowing the Redis service to migrate to a different machine and the application instances to find the new address of the Redis service through as simple DNS lookup!