Phase out your legacy applications with confidence by testing in production

14 Apr, 2021
Xebia Background Header Wave

Many of us are dealing with legacy. Think about that application that uses ancient technologies or one that is thoroughly under-documented and tested. These applications, at some point, become a liability. Consequently, an effort is required to either drastically improve them or phase them out completely.

Shaping this effort is not trivial. We want to modernize, but obviously without any regression. These legacy applications offer necessary functionality, even though we might not even know what this functionality actually entails. A good start to tackle this is to take an incremental approach – to replace or refactor small pieces. 

Recently, I was part of a team in an e-commerce environment, and we faced the challenge of phasing out a legacy application. This application was the platform responsible for almost all e-commerce functionality, from shopping basket to fulfillment. We followed this approach of extracting functionality, starting at the edges, with payment and fulfillment, and slowly working our way inward to the shopping basket and order process. Piece by piece, we could replace the whole system.

Now the question becomes how to handle such a piece effectively.

Before we start: Isolate the scope

Before we can start producing something, we need to find out what exactly we should be developing. What is the functional scope of the part we aim to replace? Passing tests, when they exist, can offer some help. The code itself might be a source to look at if the code’s readability is adequate. My preferred strategy is to identify the public interfaces that offer the functionality within our scope and, next, to start adding new tests. Michael Feathers describes a Characterization Testing process to discover the functionality of a piece of software.

Isolating the scope with a set of tests worked well for us when we extracted the payment functionality. When it came time for us to migrate the shopping basket value calculation functionality to a separate service, we faced another challenge. The incoming request for this calculation contained a lot of data that influences the outcome, e.g., the basket’s products, added discounts, some user data, and supply chain information. The business rules were not always clear. It was, however, imperative that they did not change.

We can spend endless time adding tests and investigating the code; chances are we will never completely figure out all the business rules hidden within the software. We can never be confident we correctly reimagined the existing functionality unless we use actual production feedback.

Testing in production


Testing in production

Our users can be a valuable source for test data. When we duplicate the production data and route it to our new implementation, we can use it to validate its performance. In this validation, the legacy application provides the expected response. This validation process is much like the one the tool Diffy uses to validate a new version of the same service.

There are two parts to this process that can get complicated:

First, we want to duplicate the data stream as close as possible to the boundary of our isolated scope. If that boundary is a simple REST-service, we may be able to duplicate the request in an API-gateway. In other cases, we need to provide a duplication mechanism ourselves. We shouldn’t hinder the user in any way. The duplicated request should be handled asynchronously and not take up many resources to prevent performance degradation.

Second, the validation is often not a simple comparison of responses. When we compare the responses, they might be modeled differently. In other cases, the incoming data stream may cause side effects in a database or place a message on a queue that we need for our comparison.

Before we use this process for additional validation of our software, it is essential to have a clear picture of what we want to validate and if it’s worth the effort. It might be time-consuming to do this analysis. My advice is to start small and experiment. Stop when the additional benefits are sparse or too costly.

In practice: the shopping basket calculation

In the case of our shopping basket calculation functionality, we wanted to find out if there were any combinations of incoming data that influenced the calculation result without our knowledge. Our calculation result should be the same as the result of the legacy application.

To validate our new implementation, we extended the calculation REST-endpoint. When the original response was about to be sent back to the consuming system, we asynchronously called a special shadow endpoint of our new service. We added the original request and the original response to that call. We would use the request to call the new service and validate the new service’s response with the original response. The validation consisted of comparing all the monetary amounts.

We added a dashboard that showed a success metric and the failed requests’ data in our observability tooling. Our first version had a ~50% success rate, simply because we had not implemented discounts yet. The feature-complete version was about 85% accurate. We had misinterpreted, among others, a business rule related to delivery costs. In the end, we got to about 99% accuracy consistently.

Why it’s OK not to have 100% accuracy

So why not 100%? Well, for a variety of reasons:

Uncovering bugs – just like with characterization testing, the mechanism to use production data for validation is a method of discovery. We found out that the original calculation service had a 1 cent rounding error in some discount-related edge cases.

Prevent degradation – We also applied this process when we were to replace a package shipment routing algorithm. The new algorithm’s goal was to offer an improvement, so the responses would not be the same in all cases. We wanted to validate there would be no performance degradation. We also wanted insight into the instances where the algorithms’ outcome would differ to see if it indeed offered the improvement we expected.

Discover unforeseen side-effects – Another reason 100% might not be achievable might be an unforeseen side-effect. A shopping basket is always calculated just before the order is placed. Because the comparison is made asynchronously, it sometimes happened that a discount triggered by a voucher was not applicable anymore; the voucher was already locked due to placing the order.


Get in touch with us to learn more about the subject and related solutions

Explore related posts