Expect it to break

In a Continuous Delivery environment it is important to maintain a stable system so new features can flow into production whenever they are ready. A broken system requires developers to focus on problem solving and applying fixes rather than to introduce cool and new features, making it of prime importance to make the system as robust as possible.

Many systems are built on the happy flow and do not give thought on handling error conditions. However, modern systems run 24 hours a day, 7 days a week and depend on other systems, making errors inevitable.

So expect the system to break, anticipate on errors and build the system to cope with errors gracefully and resume normal operations when the error condition is resolved.

You might think that taking measures to increase robustness will cost additional development time, but in fact it will drastically lower the amount of time that is necessary to keep the system stable. As shown in the figure below, the cost of fixing errors in production is an order of magnitude higher than preventing errors in the development stages.

PJ6mIj4 ECzIGK1clc VkuAeoo5YRLgb8eSLHPv38BqEPpVPr5xQGLMv qzXywe7 DrvvXI 9UfGlfFWL4bsbUH vK6AIvRJ FaZZNrjnTSyvcNVjZuIjwz

How to apply the principle

When designing and developing a system, build it in such a manner that individual components are robust and can be treated as autonomous units. Each component should be able to deal with the possibility that another component or subsystem is not available and respond in a non-destructive manner.

Provide backward compatible interfaces

Provide backward compatible interfaces such that whenever a service is upgraded, the calling components can still call upon the ‘old’ version of the service as well. You may log the usage of the old version and lobby for your clients to upgrade.

Ensure graceful degradation

Each component should be able to deal with subcomponents and/or subsystems that might not be able to return a value. The same goes for components in the top layers. When in an application top layer (i.e. controller or presentation layer) it is detected that any of the components in the lower layer did not return a proper response, the controller could decide to hide specific functions to the end-user while continuing support for the other functions that still are able to work 100%.

Create self healing components

Create self healing components that can detect different error states and implement appropriate strategies to remove the error condition itself. If it cannot resolve the problem independently, the component should report the condition so that it detected by the monitoring system and resolved.

Test driven development

Before you implement functionality, write the test(s) first. This allows you to catch errors and prevent the delivery of broken components to production. Over and over again.

This post is part of a series on Continuous Delivery. Please see our tag Continuous Delivery for more posts on this subject. Or check our Continuous Delivery website to learn how Xebia can help you improve your time to market, reduce costs and improve quality using Continuous Delivery best practices.

Continuous Delivery Essentials: Expect it to break