Mark van Holsteijn 06 Apr, 2015
Providing High Availability to stateless applications is pretty trivial as was shown in the previous blog posts A High Available Docker Container Platform and Rolling upgrade of Docker applications using CoreOS and Consul. But how does this work when you have a persistent service like Redis? In this blog post we will show you how a persistent service like Redis can be moved around on machines in the cluster, whilst preserving the state. The key is to deploy a fleet mount configuration into the cluster and mount the storage in the Docker container that has persistent data.
To support persistency we have added a NAS to our platform architecture in the form of three independent NFS servers which act as our NAS storage, as shown in the picture below.
- app-hellodb@.service - the template unit file for the web application
- redis.service - the unit file of the Redis server
- mnt-data.mount - the unit file for the required mount the Redis Server.
Preparing the application
To see the failover in action, you need to start the platform and deploy the application: [bash] git clone https://github.com/mvanholsteijn/coreos-container-platform-as-a-service.git cd coreos-container-platform-as-a-service/vagrant vagrant up ./is_platform_ready.sh [/bash] This will start 3 NFS servers and our 3 node CoreOS cluster. After that is done, you can deploy the application, by first submitting the mount unit file: [bash] export FLEETCTL_TUNNEL=127.0.0.1:2222 cd ../fleet-units/app fleetctl load mnt-data.mount [/bash] starting the redis service: [bash] fleetctl start app-redis.service [/bash] and finally starting a number of instances of the application: [bash] fleetctl submit app-hellodb@.service fleetctl load app-hellodb@{1..3}.service fleetctl start app-hellodb@{1..3}.service [/bash] You can check that everything is running by issuing the fleetctl list-units command. It should show something like this: [bash] fleetctl list-units UNIT MACHINE ACTIVE SUB app-hellodb@1.service 8f7472a6.../172.17.8.102 active running app-hellodb@2.service b44a7261.../172.17.8.103 active running app-hellodb@3.service 2c19d884.../172.17.8.101 active running app-redis.service 2c19d884.../172.17.8.101 active running mnt-data.mount 2c19d884.../172.17.8.101 active mounted mnt-data.mount 8f7472a6.../172.17.8.102 inactive dead mnt-data.mount b44a7261.../172.17.8.103 inactive dead [/bash] As you can see three app-hellodb instances are running and the redis service is running on 172.17.8.101, which is the only host that as /mnt/data mounted. The other two machines have this mount in the status 'dead', which is an unfriendly name for stopped. Now you can access the app.. [bash] yes 'curl hellodb.127.0.0.1.xip.io:8080; echo ' | head -10 | bash .. Hello World! I have been seen 20 times. Hello World! I have been seen 21 times. Hello World! I have been seen 22 times. Hello World! I have been seen 23 times. Hello World! I have been seen 24 times. Hello World! I have been seen 25 times. Hello World! I have been seen 26 times. Hello World! I have been seen 27 times. Hello World! I have been seen 28 times. Hello World! I have been seen 29 times. [/bash]Redis Fail-over in Action
To see the fail-over in action, you start a monitor on a machine not running Redis. In our case the machine running app-hellodb@1. [bash] vagrant ssh -c \ "yes 'curl --max-time 2 hellodb.127.0.0.1.xip.io; sleep 1 ' | \ bash" \ app-hellodb@1.service [/bash] Now restart the redis machine: [bash] vagrant ssh -c "sudo shutdown -r now" app-redis.service [/bash] After you restarted the machine running Redis, the output should look something like this: [bash] ... Hello World! I have been seen 1442 times. Hello World! I have been seen 1443 times. Hello World! I have been seen 1444 times. Hello World! Cannot tell you how many times I have been seen. (Error 111 connecting to redis:6379. Connection refused.) curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received curl: (28) Operation timed out after 2007 milliseconds with 0 out of -1 bytes received Hello World! I have been seen 1445 times. Hello World! I have been seen 1446 times. curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received Hello World! I have been seen 1447 times. Hello World! I have been seen 1448 times. .. [/bash] Notice that the distribution of your units has changed after the reboot. [bash] fleetctl list-units ... UNIT MACHINE ACTIVE SUB app-hellodb@1.service 3376bf5c.../172.17.8.103 active running app-hellodb@2.service ff0e7fd5.../172.17.8.102 active running app-hellodb@3.service 3376bf5c.../172.17.8.103 active running app-redis.service ff0e7fd5.../172.17.8.102 active running mnt-data.mount 309daa5a.../172.17.8.101 inactive dead mnt-data.mount 3376bf5c.../172.17.8.103 inactive dead mnt-data.mount ff0e7fd5.../172.17.8.102 active mounted [/bash]Conclusion
We now have the basis for a truly immutable infrastructure setup: the entire CoreOS cluster including the application can be destroyed and a completely identical environment can be resurrected within a few minutes!- Once you have an reliable external persistent store, CoreOS can help you migrate persistent services just as easy as stateless services. We chose a NFS server for ease of use on this setup, but nothing prevents you from mounting other kinds of storage systems for your application.
- Consul excels in providing fast and dynamic service discovery for services, allowing the Redis service to migrate to a different machine and the application instances to find the new address of the Redis service through as simple DNS lookup!
Mark van Holsteijn is a senior software systems architect at Xebia Cloud-native solutions. He is passionate about removing waste in the software delivery process and keeping things clear and simple.