# Accurate Forecasting Without Estimation

Often the team or somebody in the role of product owner gets either the question ‘When will it be finished?’ or ‘What will be finished at <some future date>?’ and some forecasting is necessary to answer the question.

This blog shows an alternative way of forecasting that is based on empirical data and skips estimation altogether. In a follow-up blog I will address the ordening of the backlog of items.

In order to answer these questions the ‘it’ needs to be (i) estimated and (ii) ordened. Both estimation and ordening the items are not for free! Estimation requires valuable time from the team, time they can also spent creating value for the customer. Keeping an up-to-date ordered list of items requires continuous attention, usually from the person fulfilling the product owner role. An ordered list of items is not stable in time: changing or expired market windows opportunities affect the ordering.

## Introduction

There are more ways of dealing with the question ‘When is it finished?’:

- Not answering! Instead providing a statement of what service can be expected from the team (SLA).
- Using tools such as ‘velocity’ and ‘Release Burn Down’ charts.
- Estimation in hours.
- Using the team’s history on throughput: probabilistic forecasting.

What makes estimation and planning so hard are the uncertainties in the work, dependencies on others, assumptions on how the system currently works, assumptions on how the work gets done, and many more that I have forgotten to mention. In the end we are interested in forecasting things of interest to our customers.

A good approach on estimation in story points while not spending too much of the team’s time is given in ‘Tips for ScrumMasters: Estimate user stories outside Sprint Planning Meetings’ [5].

While many teams use velocity for forecasting and the blog by Mike Cohn titled ’Improving On Traditional Release Burn Down Chart’ [1] provides an effective way of handling variations in ‘velocity’, it is a ‘white box’ approach. With this I mean that it has assumptions on *how* the team operates.

An alternative approach is to take a ‘Black Box’ view on the view thereby eliminating any assumptions on *how* the team operates. This can be done either by using *Cycle Time* (“Time between the moments the customer gets stuff”) or *Throughput *(“The number of stuff the customer gets every week). A very good introduction on forecasting based on *Cycle Times* is given in [2].

## Throughput

Let’s call the ‘stuff’ that the customer wants ‘Stories’. Then the throughput equals the number of stories delivered to the customer bi-weekly.

The magic is to use data that is already available to the team and that is based on history. Empirical data. To this end just count the number of delivered stories. Using 10-20 measurements already give reasonable forecasts (See [3] for details). The example on the right shows throughput data collected for a team over the course of 12 months. It shows the number of times a certain throughput occurs. The data collected in this example is:

3, 3, 4, 4, 7, 5, 1, 11, 5, 6, 3, 6, 6, 5, 4, 10, 4, 5, 8, 2, 4, 12, 5

Each number equals the number of stories delivered in that particular time period.

It is important to note that these numbers are not random numbers. They reflect the type of requests from the customer, size of the requests, and perhaps the way it already has been sliced. Furthermore, these numbers also reflect – in a complicated way – how the team works, dependencies they might have, and other factors. No matter how complicated we do not need to know the details or any factor that we have missed. All this is reflected in the distribution of the numbers (how often does a certain throughput occur).

## Forecasting

Using Troy Magennis’ *Throughput Forecaster* [4] we will turn the sequence of numbers from the previous section into a forecast. We will take two usages as an example: (a) when will 60 items be done? And (b) how many items will be done in 3 sprints?

Follow the steps:

- Download the spreadsheet,
- Copy-paste the sequence of throughput numbes in the worksheet ‘Throughput Samples’,
- Use worksheet ‘Forecast’ to fill in the date to start the forecast (2017/10/24), length of the sprint (14 days), use throughput ‘Data’, and the number of items (60) to obtain the forecast for,
- Read-off the forecast with desired confidence level,
- Optionally, forecast the number of items done by specifying the number of sprints to look ahead.

## Accurate Statements

Based on an observed sequence of throughput numbers one can do forecasting. With this gained insight we can give accurate statements.

### When will it be done?

From step 4 we get the result shown at the right.

From the table we can induce the accurate statement:

In 85 out of 100 times we will deliver 60 stories before the 24th April 2018.

This is very powerfull because the team is fully transparant of what they are capable of while leaving the risk taking at the customer. The latter knows how much risk he/she is willing to take.

The table shows that delivering before May 8th, 2018 is almost a certainty (95% confidence), April 24th, 2018 reasonable, and end of March 2018 unrealistic. It is up to the requestor, stakeholder, product owner to use this information wisely. Only they know the value in promising early dates and the risk they are willing to take.

### What will be done?

Step 5 tells us with confidence the number of work items delivered within a fixed time period.

From this we can infer the statement

In 85 out of 100 times we will complete 13 items, and in 95 out of 100 times 11 items will be completed.

In practice I have experienced that this type of statement is usefull in case of a fixed scope and we need to answer the question ‘When will it be done?’.

- a deadline exists and the team needs to know ‘What is likely to be finished?’
- how many options do we need to have in advance (or how many refined stories do we need to have up-front).

Completing 11 items in 3 sprints is an almost certainty, while 23 is highly unlikely and unreaslistic. In addition, the above shows that having 13 refined stories is enough.

## Conclusion

Probabilistic forecasting based on throughput is a good alternative to more traditional methods. It:

- Is based on empirical data,
- uses data that is easy to obtain,
- provides the means to make very accurate statements,
- is model and evidence based,
- is something you can start using tomorrow!

## References

[1] Mike Cohn, ‘*Improving On Traditional Release Burn Down Charts*’, Mountain Goat Software, https://www.mountaingoatsoftware.com/blog/improving-on-traditional-release-burndown-charts

[2] Dimitar Bakardzhiev, ‘*#NoEstimates Project Planning Using Monte Carlo Simulation*’, Published on InfoQ, 2014, https://www.infoq.com/articles/noestimates-monte-carlo/

[3] Pieter Rijken, Jasper Sonnevelt, ‘Probabilistic forecasting using empirical task data for a Lean process’, https://www.dropbox.com/s/w3fa9bl60a6m7fp/ErrorEstimatesInProbabilisticForecasting.pdf

[4] Troy Magennis, ‘*Single Feature Forecast Spreadsheet*’, Focused Objective, http://focusedobjective.com/single-feature-forecast-spreadsheet/

[5] Marco Mulder, ‘*Tips for ScrumMasters: Estimate user stories outside Sprint Planning Meetings*’, Xebia Blog, 2009, https://xebia.com/blog/tips-for-scrummasters-estimate-user-stories-outside-sprint-planning-meetings/

**Do you want to know more about this subject?**

Look at our consultancy services, training offers and careers below or contact us at info@xebia.com

## Comments

This Monte Carlo method is definitely very interesting, but at the end of the day the recommended throughput estimate comes very close getting the lower bound of the confidence interval indicated by average and variance. As a formula:

Throughput ( n iterations ) = n * AVG – SQRT ( n * VAR )

(this was at least close for the limited set of numbers I tried)

Obviously this is less sophisticated, but the simplicity of the formula makes it less of a black box.

Thanks for pointing this out. Indeed, for a rough estimate the formula you provided works fine.

In this case the average is 5.3 and the sqrt(var) is 2.4 and therefore statements based on twice the sqrt(variance) are questionable and the Monte Carlo method helps to formulate statements based on other confidence levels.

Again, thanks for the addition.

Hi!

Nice stuff!

I have one little, perhaps nit-picky, comment though 🙂

All of what you describe is estimation. Not “without”.

“Estimate” is the umbrella term for all of this:

– Using tools such as ‘velocity’ and ‘Release Burn Down’ charts.

– Estimation in hours.

– Using the team’s history on throughput: probabilistic forecasting.

Using a metaphor, it’s the same as “wood chopping tool” is the umbrella term for axes and chainsaws etc.

Not a biggy. Just wanted to point that out 🙂

Thanks!

Henrik, Thanks for taking the time to read and comment!

You are right that forecasting is also an estimation. Difference is that it requires no additional estimation in the form of story point estimate, hours estimate or any other means of estimation of individual work items.

The text articulates this difference by using the terms ‘forecasting’ and ‘estimation’ to distinguish between the two.

Pieter

Got it! Personally I’d probably make the distinction as between “gut feel” or “professional judgment” and “forecasting”. But I understand your point 🙂