In this episode of the middleware pitfalls top-10 we want to discuss the merits of a clean and standardized set of (test) environments. Some refer to such a set as DTAP, an acronym for Development, Test, Acceptance-test (or pre-production) and Production. From here on the text contains capitals to indicate an environment. Basically the situation is like testing itself: you will never get it 100% right, but it will help you a lot if you invest in a sound, maintainable DTAP.
The following example is loosely based on the situation at one of our recent customers. They have a pretty impressive set of applications and services that together implement mission-critical functions for various types of end-users. The backend consists of pre-web technology that is accessed via web services. The backend application is roughly ten years old, has had its fair share of problems in the past but is now quite stable. The team has developed a release schedule they feel is vital for ensuring stability. Every three months after rigorous testing in the Development environment, the new release is put in Test. From there on, the application is eventually rolled out in Production. It has been like this for years.
The web teams have a Development environment as well (not being their desktops), but it is populated with odd developer data, which makes it unusable for testers. So the testers maintain their test-data in the Test databases. They all work in Scrum teams (unlike the backend team), so testing here is a continuous activity. You can by now guess what the problem is. The Scrum testers will have to wait three months to get their features tested, because they very much depend on the backend system.
To add to this, there´s also a set of user stories that need interaction with external companies. The network guys have decided that their firewalls will only be really opened in the Acceptance environment. So, only after all functional testing has been done in Test, and in a stage of the project where the ultimate deadline approaches (you know the deal, marketing has already started a radio campaign, stating that on that day the website will be totally “new and improved“), only then you are able to test the b2b functions. Of course all project planning has turned out a little too optimistic, but so far, everything is going pretty smoothly.
Less than a week before going live someone in the web team discovers that his application in Acceptance is actually connected to the security component in Test. Acceptance and Production is 100% equal (as it should) but Test differs in unclear ways. So the Acceptance component has never been tested. Now suddenly, there´s no guarantee that Production will work! One day later: more horror: security in Test and Acceptance is totally different. In Acceptance, it does not work!
You can imagine the stresslevels at this time. Luckily the story ends well, although it cost them a lot of hard work.
The problems exist at different levels.
- Organizational: different teams have different views and practices concerning the test environments. In the example this resulted in testing too late.
- Procedural: Changes to the environments are not evaluated and recorded (or only in production). You will have spaghetti!
- Conscious decisions: The firewall policy in the example. Not bad in itself, but results in late testing.
- Errors: unexpected or wrong connections between components in different test environments.
- Documentation debt: the lack of adequate up-to-date technical documentation (diagrams).
- Negligence: changes to one environment are not replicated in other environments.
- Mindset: “this error will probably not show up in acceptance and production”. This belief originating in existing DTAP differences will open the door for mister Murphy, who is sure to enter the room.
- Scaling differences: the Development environment has no clustering. This could implicate errors later on, due to applications not being able to be clustered.
The longer companies allow DTAP problems to exist, the harder they will be to correct. That is no rocket science. Probably everyone in IT knows it, including the responsible managers. It might be a little bit difficult to explain to the financial guy, the product owner or the customer. It´s like code refactoring. You will have to explain the need for internal housekeeping in order to get the necessary budget.
So what to do? Again this is no rocket science. The main point of this blog is that you need the awareness that things are wrong and that this will result in delayed projects, having more bugs than necessary, rising stress levels, frustration or anger between developers, testers, and administrators.
Some words of advice:
- Turn the right (amount of) knobs. All changes must be approved and coordinated by a change manager. Replicate the changes. Document them. Test the test environment after every change
- Keep it simple. Do not introduce too many differences, because of lower costs. It will increase administrative burden. So create that cluster in Development, but scale to a lower number of cluster members.
- Get your teams aligned. They must share the way in which they treat a test environment, including release cycles.
- Be defensive. Be pessimistic. Pessimistic people are often right. An administrators’ error often has far bigger impact than a programmers’. Test everything.