Using metrics to find the pain points in a legacy codebase
When you are new to a codebase, you may realise that it’s new for you, but not new to the world. It’s code that has been around for ages and is hard to change, it’s hard to maintain. This is legacy code by definition, and it’s your job to work with it. Even if code hasn’t been around for long, or if it was perfect when the project started, it easily gets worse over time. I see it as the second law of thermodynamics, which says that the entropy of an isolated system does not decrease over time, but evolves towards an equilibrium of maximum entropy.
The entropy – or disorder – of large, badly tested classes which we are asked to change. In this post, I will explain how I like to approach such a codebase. Is it hopeless? If not, where do we start cleaning it up?
I work as a software consultant, and as part of my job I dive into new codebases, so I find it useful to get a quick overview of the state of a codebase. One useful metric is the code coverage, which is the percentage of code which is “covered” by tests, for example, unit tests. A unit test gives me the confidence that when I change some lines of code, I get feedback as to whether I’ve broken the code’s functionality. It also serves as documentation, as it can explain the intent of the person who built it.
Another thing I look at is the cyclomatic code complexity of classes. This is a measure for the number of execution paths through a part of the code, for example, a method. Each execution path would need one test to get 100% code coverage (this level of code coverage is not always achievable, as there are always bits of code that are unfeasible to test). Coverage and complexity are often related. If the complexity of a class is too high (say, over 10-15), it has too much logic in it. Perhaps it has more than one responsibility (it violates the Single Responsibility Principle). It then becomes hard to test, and hard to maintain. A hard to maintain codebase is by definition a legacy codebase.
Of course, code coverage and cyclomatic code complexity are not the only measures to determine the quality of code, but I find it a good starting point because they’re such good measures of maintainability. If something is maintainable, it’s much easier to change and move forward. And all we want is to move forward because we want new features!
You could look at just code coverage and complexity and identify the classes that need to be improved and start the cleanup. Split up classes, add tests, refactor, simplify. The code coverage will increase, complexity will go down, maintainability will go up. This is all good, but in the meantime, you’ll end your day of work being exhausted, and your manager might wonder what you’ve been up to, and if this was time well spent. Probably not. This approach will take (depending on the size your codebase) tremendous effort, and not all of it is necessary.
This is where code churn comes in. Churn is a measure of how often a part of your codebase changes over time. Combining this measure with complexity and code coverage allows you to see where the problematic parts of your codebase are that also change often, and thus would benefit from a cleanup. After all, unlike a financial loan, the interest on your tech debt (of your legacy code) is paid when you change your code, not over time. I do assume here that if a part of the codebase has recently been changed, it’s likely to be changed in the near future (which I think is fair).
Churn vs cyclomatic code complexity
A helpful way I find to make the above clear is by plotting the code complexity against the churn, like in the example below.
Each point in the chart represents a class, red ones have code coverage < 75%. The churn is the number of changes to a class over the past 12 months.
A good way of looking at this chart is by splitting it up into four quadrants:
- Bottom left: this is the area where code doesn’t change often, and the complexity is low. No issues here!
- Bottom right: also on the bottom (and good) half of the chart, but this part does change often. It’s fine, complexity is low! There could be some data classes in here, configuration, for example.
- Top left: this is where the code is complex, but it doesn’t change often. You could improve your code here but you won’t get many of the benefits, as you would get those the next time you change it.
- Top right: the main area of focus, here things change often, and the code is complex. It probably takes more time than anywhere else in the code to change things around here. This is where debt needs to be paid off!
Also, notice how I had to plot the chart on logarithmic axis to get a sensible view of the state of this codebase.
How to start improving a legacy codebase
So now that it’s time to actually start improving your codebase, here are some tips that have worked well for me.
- If you’re putting some serious effort into refactoring a part of your codebase, make sure you focus on the top right quadrant. This is where you’ll get most benefit in the future. This doesn’t imply doing a big one-off code cleanup of the top right quadrant. This will be exhausting, and after that’s done it’ll slowly move back towards legacy. What’s needed is a change of mentality, of discipline, to make sure you don’t end up in this situation again.
- Make it part of your day to day feature building work. Whenever you touch a piece of code, try to keep the complexity down, and code coverage up. Make a routine of the boy scout rule, to leave the campsite cleaner than you found it.
- For any feature you add, I recommend following the Test-Driven Development (TDD) approach, which will prevent you from creating too much complexity and badly tested code in the first place.
- If the code is really in a state where you don’t know how to improve it, or test it properly, I highly recommend reading Micheal Feathers’ book Working Effectively with Legacy Code.
I followed this approach on one of my projects, see below for the churn vs complexity charts of the changed files in a pull request before (left) and after (right) they were changed:
Although the code coverage wasn’t too bad here, the complexity was. As you can see the overall complexity has gone down, by basically splitting up the more complex and often changed classes (the two on the right) into a bunch of new classes, with low complexity and high test coverage. As these classes are so new they will probably move to the right soon, because it may take a bit of time before they reach a more mature design. This is fine, as long as the complexity stays within boundaries.
Note that we haven’t solved it all here, some classes are still in the top half of the chart. This is fine too, as long as they move down each time we change them, eventually, we’ll get them in the right place.
How to make these charts?
Creating these charts involves a couple of steps but is fairly straightforward. You first need to get the metrics. For code coverage and complexity I ran a sonarqube analysis and used its API to retrieve the metrics. For the code coverage to be available in sonarqube, I generated coverage reports from unit tests with jacoco (the project I analysed was written in Java). I took the churn data from git, but any SCM should be able to give you these numbers. For each file, I ran something along the lines of:
git log --oneline --since=12.month some-file | wc -l
The data was put into a chart using Google Visualization. That’s all!
In this post I showed you how to figure out where to start and focus cleaning up your legacy codebase, using code complexity and churn. You may feel hopeless being confronted with building a feature in a legacy codebase, but fear not! If you try to avoid big one-off code cleanup efforts, but cleanup incrementally, using the boy scout rule, focussing on the top right area of the churn vs code complexity chart in particular, and following the practice of TDD, you’ll get it done.