Trying to produce bad quality code is quite hard when you are using Test Driven Development (TDD), even when you are doing it wrong on purpose.
Recently Iwein and me were preparing some labs for a developer training and the plan was to create some really bad quality Java code as a starting point. Students would then be asked to clean it up and add some new features, all this of course with the intent to show the effect of bad code quality on your ability to quickly add new features. This was going to be a piece of cake!
After some brainstorming for interesting standalone programming challenges, we came up with the idea of writing a JSON to XML converter. It should be able to take any valid JSON string and convert it into a simple XML representation. Out of habit and without really considering the option of skipping this step, we started with a simple failing test. Here it is:
[java]
@Test
public void shouldConvertTopLevelEmptyArray() {
assertThat(converter.convert("[]"), is("<array></array>"));
}
[/java]
Simple, right? To implement our converter we decided to use the well known "red, green, as little refactoring as possible" anti-pattern, which we expected to result in lots of copy-paste duplication, very long methods, and the other typical code smells we all know and loath. Our first implementation approach was to go for some all-time favorite candidates for producing bad code: string manipulation and regular expressions. As Jamie Zawinski famously said: "When some people have a problem, they think: ‘I know, I’ll use regular expressions’. Then they have two problems." We had created a sure thing recipe for disaster. It was going to be all downhill from here, or so we thought.
This worked well for a while, producing green tests and a slowly growing single long convert method that could really use some cleaning up. Unfortunately we soon found that we had programmed ourselves into a corner when we wrote the first few tests for parsing nested objects and arrays. It seemed we needed some form of recursion and that just did not fit well with the direction we had turned into.
I would not say that solving this problem using only regular expressions, without recursion, is impossible, but it is definitely quite hard. This is not a problem when looking only from the objective of writing bad code, but we underestimated ourselves in being able to write bad code that actually works. This turned out to be the fundamental problem in our endeavour.
We thought it would be easier to start parsing the input one character at a time and that this would still leave plenty of room to create horrors beyond imagination. We were right about the first part. Parsing a simple format recursively is as tedious as any parser writing exercise, but it is not hard. Then something interesting started to happen. Every time we fixed a small functional problem by debugging the bad code we had just produced, it started becoming less bad. When it worked in the end it turned out that we didn’t do very well on the primary objective.
We sat there a little bit amazed. "This isn’t nearly as bad as I expected…" "I’ve seen way worse" "I’d let this go to production without losing much sleep". On the one hand it is comforting to know that you’re incapable of producing the crap that you see when reviewing all the time, or even the kind of crap we wrote ourselves just a few years ago. On the other hand it is disappointing. But why is it that we failed?
There are two unconscious behaviors that were in the way here:
- we tried to avoid "well known" traps, i.e the mistakes we have made so often that we avoid them automatically.
- we took small steps to avoid getting lost
These behaviors are generally supported and enforced by TDD. Had we been more conscious of this, we would have known right at the first test that it would not result in overly complex code: if you start with the simplest possible failing test and add complexity (i.e. new failing tests) in small steps, you only make it extra hard on yourself to produce the desired pile of crap ! During the debugging, when we had added a test that failed in some unexpected way, we applied small refactorings to make problems easier to debug or easier to solve, which almost automatically resulted in more modular and simple code. So taking the easy way out results in simple and readable code. Who would have guessed it (Doh!).
Have a look for yourselves at the tests that we worked from and the resulting code. It is definitely not as clean as a it could be or the most elegant solution but is this really bad code ? What do you think ?