Stateful Testing in Scala

14 Mar, 2023
Xebia Background Header Wave

One of the most used testing libraries in the Scala ecosystem is Scalacheck. The functional programming team at Xebia has always advocated for this mature tool. We’ve published content about this topic previously and are currently offering a relevant Property-Based Testing in Scala course in the Xebia Functional Academy. The official user guide is also a great starting point to learn about this library.

In this post, I want to show you something I’ve been digging into lately: stateful testing. This technique can be seen as a non-formal adaptation of model checking. Stateful testing is useful when verifying systems that behave differently across a series of commands, like in a temporal line.

A quick intro to property-based testing

Before diving into testing stateful systems, let’s introduce property-based testing (feel free to skip this section if you already know about it). The first step is to stop thinking about tests and think about properties; thinking of property-based testing as just testing with random data is too narrow of a view. This way, we check that a specific part of our application (or a system, function, whatever) has some property. It can be a challenging task because properties need to be deterministic.

For a specific example, let’s compare a unit test with a property-based test for code that should work properly with numbers between 0 and 100. Good coverage with unit testing requires at least the following cases:

  • A value of 0 should succeed
  • A value of 100 should succeed
  • A value of -1 should fail
  • A value of 101 should fail

Something not closer to the boundaries is also a good practice. With property-based, we think about properties, so two properties are enough:

for all numbers
between 0 and 100 inclusive
the test should succeed

for all numbers
not in between 0 and 100 inclusive
the test should fail

The framework generates values that are not entirely random but also those closer to the boundaries or limits (like the maximum integer). As you can imagine, property-based testing brings multiple benefits, and some of the more important ones are:

  • Changes how you model your system and the tests specifically by thinking more about properties and how your business logic should behave with inputs that meet different criteria.
  • The generated data is not limited to fixed values; any value that meets the pre-condition can be a candidate for the input. Consequently, the data includes extreme values, like empty strings or the maximum integer, and hard-to-identify values, like special Unicode characters. These types are difficult to think about when you are developing unit tests.
  • Shrinks inputs when the property fails so users can get friendlier counter-examples of their failed tests.
  • They are reproducible. The property-based frameworks generate data using a seed shared in case of failure, and users can pass that seed to the property to run the test against the same generated data.

It’s worth mentioning that property-based testing doesn’t completely replace unit testing. It’s a different approach that is better for some use cases, but not all.

In Scala, Scalacheck has become a standard de facto for property-based testing, and consequently, there are excellent resources out there (especially talks) that cover the basics and functionalities. Some of them are:

Model and commands

Now, let’s see what stateful testing is and how we can use Scalacheck to develop these specialized types of property-based tests.

Based on a model defined by the user, the closest to the real system, the framework generates a set of commands representing the execution flow, runs the system against that flow, and compares the system to the model.

With that in mind, we can identify three major components of stateful tests:

  • The model
  • The commands sequence generation
  • The validation

Let’s use a simple example that we will later implement using Scalacheck. Imagine we have a SizedQueue we want to verify. A SizedQueue is just a queue that can keep N elements. Once you insert an N + 1 element, the first inserted element should be discarded.

trait SizedQueue[A] {
  def push(a: A): Unit
  def pop: Option[A]
  def clear(): Unit

The operations that our sized queue accepts are "Push," "Pop," and "Clear." We decided that those three were our commands to keep the exercise simple.

The model comprises two parts, a state and the operations that should update that state based on the commands provided. Remember that we defined the model as a state closest to the actual system so that we can use a simple list for our example’s state. The push operation adds an element to the list; the pop gets the latest element, and the clear generates a new list.

The command sequence generation is built with the list of the commands with their respective value generators and pre-conditions. The pre-conditions are an essential element that decides if the command makes sense about the current state.

In our example, we’ll need to generate sequences of the three commands with different frequencies. We could define some pre-conditions. For example, don’t apply a "Clear" if the state is empty. More advanced use cases require more complex pre-conditions.

Finally, the validation should consider the system validation against the model. In this case, we need to provide post-conditions to check the results using the current state.

Execution model

Most of the frameworks that implement stateful testing use a two-phase execution model. In the first phase, the framework uses the model and the commands sequence generator to create a list of potential commands, and pre-conditions are used to discard commands and generate new ones.

In the second phase, the framework checks the system with the provided state and post-conditions. The pre-conditions are rechecked, causing a failure in this phase.

Stateful testing two phases

Luckily, Scalacheck provides a simple API for modeling stateful tests based on the abovementioned concepts. Everything is in one file, Commands.scala, with excellent documentation.

Scalacheck uses the concept of "Commands" that represent an aggregated model for the commands + pre-conditions + post-conditions. To create our stateful property test, we need to:

  • extends the Commands trait
  • Define our Commands
  • Implement the methods for initializing the system, generating initial states, tearing down, and generating commands

Commands are defined using two types: State, representing the model, and Sut, the system under test (SUT).

Let’s implement the Commands for our SizedQueue example. We use a List to represent our State. We’re also adding a capacity param to our state to test different queue max sizes.

object SizedQueueCommands extends Commands {

  case class TestState(capacity: Int, internal: List[Int])

  override type State = TestState
  override type Sut = SizedQueue[Int]

Now it’s time to implement some methods. We start with the initial state generators and SUT initializer and destroyer.

  override def genInitialState: Gen[State] = Gen.posNum[Int].map(TestState(_, Nil))

  override def newSut(state: State): SizedQueue[Int] = SizedQueue.of(state.capacity)

  override def destroySut(sut: SizedQueue[Int]): Unit = {}

Our initial state is a random positive int and an empty list. We don’t allow the initialization of new-sized queues with existing data. Then, in newSut, we create a new instance of SizedQueue using the capacity of the initial state. Lastly, we don’t perform any action in the destroySut, but here you can release resources, for example.

Now, we’re going to implement a couple of special methods in Scalacheck commands:

  override def canCreateNewSut(
    newState: State,
    initSuts: Iterable[State],
    runningSuts: Iterable[SizedQueue[Int]]
  ): Boolean = true

  override def initialPreCondition(state: State): Boolean = state.internal.isEmpty

canCreateNewSut allow us to limit the number of co-existing Sut instances. On the other hand, initialPreCondition is used during the shrinking phase to determine if the state is an initial state. In our case, we allow multiple Sut instances, and our initial state is one that is empty.

Now it’s time to define our Commands. Scalacheck provides the type Command and the subtypes SuccessCommand, UnitCommand, and NoOp.

trait Command {
  type Result

  def run(sut: Sut): Result

  def nextState(state: State): State

  def preCondition(state: State): Boolean

  def postCondition(state: State, result: Try[Result]): Prop

Let’s start with the Clear. In this case, it extends UnitCommand since this operation generates no result.

case object Clear extends UnitCommand {
  override def run(sut: Sut): Unit = sut.clear()
  override def nextState(state: State): State = state.copy(internal = Nil)
  override def preCondition(state: State): Boolean = true
  override def postCondition(state: State, success: Boolean): Prop = success

We see four methods in this trait. run and nextState are self-explanatory. On the other hand, preCondition and postConditions are new concepts in this approach.

As explained at the beginning of the post, with the pre-conditions, we make sure that the current command makes sense for the current state. In this case, we always allow Clear.

In the post-conditions, we need to verify that the result is what we expect with the current state. In this case, Clear is a UnitCommand, so the API passes a boolean indicating if the command execution succeeded or not. A simple implementation is to return that boolean.

The Push command is also a UnitCommand:

case class Push(v: Int) extends UnitCommand {
  def run(sut: Sut): Unit = sut.push(v)
  def nextState(state: State): State =
    state.copy(internal = (v :: state.internal).take(state.capacity))
  def preCondition(state: State): Boolean = true
  def postCondition(state: State, success: Boolean): Prop = true    

run calls to our Sut.push(v) with the case class Push value and nextState puts the element in the first position of the list and limits the capacity.

Finally, the Pop, which extends Command with the type Option[Int]:

case object Pop extends Command {
  override type Result = Option[Int]
  override def run(sut: SizedQueue[Int]): Result = sut.pop
  override def nextState(state: State): State = 
    state.copy(internal = state.internal.drop(1))
  override def preCondition(state: State): Boolean = true
  override def postCondition(state: State, result: Try[Result]): Prop =
    Success(state.internal.headOption) == result
  • The run calls to our Sut.pop
  • The nextState removes the first element
  • In the postCondition, we succeed if the result corresponds with the first element. Bear in mind that, here, the state is the state before applying the nextState

Now it’s time to implement the last Commands method that should generate the commands:

override def genCommand(state: State): Gen[Command] = {

  val pushValueGen: Gen[Int] =
    if (state.capacity > 0 && state.internal.size > state.capacity / 2) {
        Gen.chooseNum(0, 10)
    } else Gen.chooseNum(0, 10)

    (3, Gen.const(Pop)),
    (1, Gen.const(Clear))

In general, one of the more challenging parts of developing property-based tests is to avoid being too naive when generating the properties. In this case, we have applied some decisions:

  • If the queue is half full, we have a 50% percent of inserting a repeated value. In the other case, we just generate a number between 0 to 10
  • We want a bigger frequency of pop and push operations than clear

Now it’s time to run the property checks. One easy way is to create a Properties object as a suite:

object SizedQueueCommandsProperty extends Properties("Test sized queue") {
  property("check") =

Then start an SBT console and run our property:

testOnly com.xebia.functional.testing.SizedQueueCommandsProperty
[info] compiling 1 Scala source to /Users/fede/development/workspace/state-machine-testing/target/scala-2.13/test-classes ...
[info] + test sized queue.check: OK, passed 100 tests.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 1 s, completed Feb 24, 2023, 7:59:02 PM

Now imagine we have an odd issue in our sized queue: after pushing the same item three times, the item is not removed from the queue with a Pop. This illustration might sound crazy, but, as an example, at LambdaJam in Chicago, Joe Norton described how he used QuickCheck to test Google’s LevelDB and find in a first iteration a failing counter-example of 17 steps.

We increase the minimum number of successful tests to increase our chances and then run the test:

[info] ! Test sized queue.check: Falsified after 1109 passed tests.
[info] > Labels of failing property:
[info] Initial state:
[info]   TestState(5,List())
[info] Sequential Commands:
[info]   1. Push(0)
[info]   2. Push(0)
[info]   3. Push(0)
[info]   4. Pop => Some(0)
[info]   5. Pop => Some(0)
[info]   6. Pop => Some(0)
[info]   7. Pop => Some(0)
[info] > ARG_0: Actions(TestState(5,List()),List(Push(0), Push(0), Push(0), Pop, Pop, Pop, Pop),List())
[info] > ARG_0_ORIGINAL: Actions(TestState(5,List()),List(Push(0), Pop, Push(0), Push(0), Pop, Push(8), Push(1), Pop, Pop, Pop, Pop),List())
[info] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error]   com.xebia.functional.testing.SizedQueueCommandsProperty
[error] (Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1 s, completed Mar 1, 2023, 11:39:04 AM

As you can see from the message above, Scalacheck found the issue after "1109 passed tests" (this was a very odd one) and shrunk the arguments to show us a minimal sequence that found the problem. After three Push(0)s and three Pops, the test found that the fourth Pop should have returned None but returned Some(0) instead.

This was just one case, but these techniques are extremely powerful in these scenarios where our logic evolves alongside a series of commands. Stateful testing, therefore, brings two essential benefits:

  • Detect issues that only appear when our system evolves through temporal logic, like a series of commands.
  • Increase the knowledge of your system. Creating these tests requires thinking about the structure and behavior of the systems under tests, creating a more behavioral view of them.

Other frameworks, like Hedgehog, also provide these utilities for performing stateful testing. We’ll revisit the exercise with that library in a future post.

At Xebia, we continuously improve our expertise around testing and other software engineering techniques with specialized research teams, a knowledge we transfer to our clients. If your team wants to improve the understanding and confidence in the software they’re building, feel free to reach out. We are ready to help you.


Get in touch with us to learn more about the subject and related solutions

Explore related posts