Blog

One Change at a Time

17 Jun, 2014

One of the Agile Manifesto’s twelve principles states,  “At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.” Many Agile teams hold biweekly retrospectives that result in concrete actions executed in the next time period (sprint). There are many types of tuning and adjustments that a team can do, such as improve the workflow, automate tasks, and increase team cooperation.

Is it a good habit for retrospectives to focus on the same type of improvement, or should the team alter the type of improvements? In this blog, I look into the effect of multiple consecutive actions that affect the flow of work.

The simulation is inspired by the getKanban Board Game, a physical game designed to teach the concepts and mechanics of Kanban for software development in a class or workshop setting.

An Experiment

Ideally, an experiment would compare two equivalent teams. The first team would perform consecutive actions to improve the workflow, and the other team would only make one adjustment to improve the workflow, then focus subsequent improvements in other areas. After a set period, the workflow would be measured and verified for each team simultaneously to see which achieved better results. But such an experiment is difficult to perform, so in this blog, I will use a simulation.

Simulation

For this simulation, a team consists of three specialists: a designer, a developer, and a tester. The team uses a kanban process to achieve flow. See the picture below for the beginning situation.

foto

The team determines how much work it will complete at the beginning of each workday and the average cycle time is measured during the simulation. The initial work in progress (WIP) limits are set to 3 for each column, indicated by the red 3s.

The average amount of work done by the team and the average effort of one work item are such that, on average, it takes one card about 5.5 days to complete.

At the end of each workday, cards are pulled into the next columns (if allowed by the WIP limits). The policy is to always pull in as much work as allowed so the columns are maximally filled. Furthermore, the backlog is assumed to always have enough user stories ready to be pulled into the “design” column. This resembles developing a new product when the backlog is filled with more than enough stories.

The system starts with a clean board and all columns empty. After letting the system run for seventy-five simulated workdays, we trigger a policy change and increase the WIP limit for the design from three to five. After this policy change, the system runs for another 100 work days.

From the chart showing the average cycle time, we will be able to study the effect of WIP limit changing adjustments.

Note:

The simulation assumes a simple uniform distribution for the amount of work done by the team and the effort assigned to a work item. I assume this is OK for the purpose of this blog. A consequence of this is that the result probably can’t be scaled. For instance, the situation in which a column in the picture above is a single Scrum team is not applicable since a more complex probability distribution should be used instead of the uniform distribution.

Results

The picture below shows the result of running the experiment.
 
retro_length
After the start, it takes the system little over 40 work days to reach the stable state of an average cycle time of about 24* days. This is the cycle time one would expect. Remember, the “ready”’ column has a limit of 3 and the other columns get work done. So, one would expect a cycle time of around 4 times 5.5, which equals 22 days – close to 24.

At day 75 the WIP limit is changed. As can be inferred from the picture, the cycle time starts to rise only at day 100 (it takes about one cycle time (24 days) to respond). The new stable state is reached at day 145, with an average cycle time of around 30** days. It takes 70 days (!) to reach the new equilibrium.

The chart shows the following interesting features:

  • It takes roughly two times the (new) average cycle time to reach the equilibrium state.
  • The response time (when one begins to notice an effect of the policy change) is about the length of the average cycle time.

*One can calculate (using transition matrices) the theoretical average cycle time for this system to be 24 days.

**Similar, the theoretical average cycle time of the system after policy change is 31 days.

Conclusion

In this blog, we have seen that when a team makes adjustments that affect the flow, the system needs time to get to its new stable state. Until this state has been reached, any new tuning of the flow is questionable. Simulations show that the time it takes to reach the new stable state is about two times the average cycle time.

For Scrum teams that have two-week sprints, the system may need about two months before new tuning of flow is effective. Meanwhile, the team can very well focus on other improvements, e.g. retrospectives that focus on the team aspect or collaboration with the team’s environment.

Moreover, don’t expect to see any changes in measurements of e.g. cycle time within the time period of the average cycle time after making a flow affecting change.

To summarize, after making flow-affecting changes (e.g. increasing or decreasing WIP limits):

  • Let the system run for at least the duration of the average cycle time so it has time to respond to the change.
  • After it responds, notice the effect of the change.
  • If the effect is positive, let the system run for another duration of the average cycle time, to get to the new stable state.
  • If the effect is negative, do something else, e.g. go back to the old state, and remember that the system needs to respond to this as well!
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Freek Wielstra
7 years ago

This is definitely interesting, however it’s also quite high-level, i.e. from a statistics point of view; whenever we (as a scrum team) do retrospective points, we often base our retrospective points on current events and observations (“the build was red a lot”), on feelings (“I feel X is slowing us down”), etcetera; we rarely maintain metrics, so we can rarely measure concrete results of retrospective points we made X sprints earlier.
A second issue we have / had is that the only measurement of velocity we had was that of all teams combined; therefore, the impact of velocity as a result of changes in team members or decisions made in individual teams’ retrospectives couldn’t accurately be measured, or people weren’t aware of it (or didn’t particularly care).
I’d like to say “All teams should maintain statistics” on the one hand, but on the other I can hear some of my colleagues grinding their teeth about them or the team getting quantified into statistics (team A comprised of 4 FTE’s achieved a velocity of 42 story points this sprint, a 2.3% decline from the average of the last 3 sprints). So it’s a tough one. Maybe these things are best left to higher-level agile coaches and management, 🙂

Explore related posts