In my previous blog I wrote how embracing failure can be a viable strategy, depending on the current quality level. Before we can determine this quality level, we first need to understand what quality is and ask the question “why do we even want quality?”.
Many books have been written on the concept of quality and the Wikipedia page on quality provides the following summary:
"Quality is a perceptual, conditional, and somewhat subjective attribute and may be understood differently by different people. Consumers may focus on the specification quality of a product/service, or how it compares to competitors in the marketplace. Producers might measure the conformance quality, or degree to which the product/service was produced correctly. Support personnel may measure quality in the degree that a product is reliable, maintainable, or sustainable."
Aligning these different perceptions seems to be a challenge, so how do we determine what amount of quality is required? How do we determine the bar for ‘this would be enough quality for now’?
When you start analyzing the various viewpoints, it becomes clear they share one common factor: they all aim at maintaining a certain risk at a certain level. You want to know how well your product performs to prevent customers from walking away. You want to have insights into your operational performance to prevent going bankrupt. You want to know process effectiveness to prevent fines.
Risk management can help with that. The concept of risk management is mentioned in various ISO norms (e.g., 9001, 14001, 27001, 22301), but they all use similar strategies to handle risks. In general, you can summarize them into four categories:
- Avoid: This is the most severe of the risk treatment options, and requires organizations to stop performing any tasks or processes that pose a risk. In software development this can be translated into ‘prevent bugs from ending up in the codebase’. This means you decide not to build a feature as the identified risks vs value ratio is out of balance or make sure the feature is doing exactly as intended. An example of the first category would be a team identifying unsolvable issues during refinement, or realizing the value of the feature has diminished. An example of the second category would be test- driven development where a test case is created before any code is developed and the tests need to pass before code can be submitted. This provides very high coverage but requires more effort during development.
- Reduce: Decreasing the risk means making it either less likely to occur or less damaging when it does occur. In software development this is mostly done by executing test cases against functional and deployed code. Since coverage will depend on the extensiveness of the test cases and time to execute will grow with the size of the test set, there is a focus on identifying the issues with the biggest impact first.
- Share (or transfer): This option means you transfer the risk – either partly or wholly – to another party. The most extreme form would be purchasing an insurance policy, but within software development this can also be implemented by moving activities from development to operations. Activities like monitoring and SLAs fall into this category.
- Retain (or accept): The last option, retaining the risk, means that you accept the risk without doing anything to address it. This is basically assuming the system is correct until proven otherwise. From a quality perspective this is mostly about making sure deviations can be mitigated as soon as possible.
Designing your quality strategy
To lower risks, we introduce quality assurance. Quality assurance can therefore be considered as a risk management process and spending effort on quality is in fact a risk mitigation strategy. Understanding the risk management strategies enables us to design a quality strategy.
So, how can you determine what quality strategy to use when you start looking at the underlying risks?
Deciding what strategy to use depends on your risk appetite for a specific system or functionality and on how much certainty you already have. Risk profiles can also be applied on multiple levels. Some systems are more critical than others, but even within these systems there are likely components with different risk profiles.
Choosing a strategy that provides less assurance (i.e. allows for more risk) will result in more deviations not being identified as early as possible, but by properly looking at your risk profile that might not be a problem. Since you can only spend an hour or euro once, you must decide where it yields the best results.
An example of a high-level risk-based quality strategy is the following:
- Avoid – Test Driven Design to make sure issues don’t end up in the product in the first place
- Reduce – Test Automation to make sure problematic issues in the product are identified before they end up in production
- Monitor – Functional monitoring to make sure issues in production are identified before they have an impact
- Accept – Incident Response to make sure issues in production are mitigated if they start having an impact
Figure 1: example of risk management to quality strategy mapping
Risk is not static, Quality is granular
When the risk profile of a system, or the certainty on it changes, the best quality strategy might also change. In high-risk situations with little certainty, you want to spend more effort in quality and decide to go for a highly-preventive approach such as Test-Driven Design. When you have more certainty about the solution, you can opt for less preventive strategies like classic test automation or even just monitoring your system for problems. By definition, a strategy is therefore not static, but should rather act as a guide.
Let’s take a fictive example about the evolution of a product, the risk profile at each stage, and the quality strategy that would fit that situation:
- A new business- critical application is being developed from scratch with new technologies. As the team has little experience with the technology and the stakes are high, they decide to use Test-Driven Development to provide the highest level of assurance.
- Later, the team decides to embed additional integrity checks in the core service. This reduces the risk of the application, but since it’s another new technology, the team decides to continue with TDD.
- One of the riskiest components is going to be migrated to a well-known provider to reduce the overall risk of the application. Since this service has extensive documentation and testing facilities, the team decides to do a set of basic tests and start with functional monitoring.
- Due to the good experiences with the external provider, they decide to connect more services to it. As this introduces more risks for fraud, the team decides to expand the test suite and verify as many fraudulent examples as possible.
- The external provider is moving towards a SaaS solution that is able to verify transactions more extensively. This will reduce the possibilities for fraud, but the team has to convert all interfaces. Again, they decide to increase the test suite to properly verify all transactions.
- Due to the good experiences with the SaaS solution the team decides to move more functionality towards the solution. Since the riskiest transactions are already properly tested, the team decides to expand the functional monitoring dashboard.
- All transactions have now been moved to the SaaS solution and functional monitoring is used to monitor all transaction types. The team decides to stop spending time on testing and do incident management when necessary.
Figure 2: the risk profile journey
Shift-left is continuous risk management
Managing these different aspects is not new within the field of quality assurance. Approaches like risk-based testing have been around for some time and can help quickly determine the most effective strategy for a certain component or system.
The shift-left movement requires that we start doing continuous risk management. Determining the accurate risk profile can be achieved with techniques like risk storming or threat modeling. An accurate risk profile allows for determining the most fitting quality strategy and adjusting your quality assurance process where needed. Using the optimal quality assurance process prevents ‘over-processing’ and ‘gold-plated engineering’ and makes sure your quality assurance process fits the risk profile.