# The Union-Find Algorithm in Scala: a Purely Functional Implementation

In this post I will implement the union-find algorithm in Scala, first in an impure way and then in a purely functional manner, so without any state or side effects. Then we can check both implementations and compare the code and also the performance.

The reason I chose union-find for this blog is that it is relatively simple. It is a classic algorithm that is used to solve the following problem: suppose we have a set of objects. Each of them can be connected to zero or more others. And connections are transitive: if A is connected to B and B is connected to C, then A is connected to C as well. Now we take two objects from the set, and we want to know: are they connected or not?

This problem comes up in a number of area’s, such as in social networks (are two people connected via friends or not), or in image processing (are pixels connected or separated).

Because the total number of objects and connections in the set might be huge, the performance of the algorithm is important.

**Quick Union**

The implementation I chose is the so called Quick Union implementation. It scales well but there are still faster implementations around, one of which is given in the references below the article. For this post I chose to keep things simple so we can focus on comparing the two implementations.

The algorithm keeps track of connected elements with a data structure: it represents every element as a Node which points to another element to which it is connected. Every Node points to only one Node it is connected to, and this Node it is called its parent. This way, groups of connected Nodes form trees. The root of such a connected tree is a Node which has an empty parent property.

When the question is asked if two Nodes are connected, the algorithm looks up the roots of the connected trees of both Nodes and checks if they are the same.

The tricky part in union find algorithms is to be able to add new connections to a set of elements without losing too much performance. The data structure with the connected trees enables us to do this really well. We start by looking up the root of both elements, and then set the parent element of one tree to the root of the other tree.

Some care must still be taken when doing this, because over time connected trees might become unbalanced. Therefore the size of every tree is kept in its root Node; when connecting two subtrees, the smaller one is always added to the larger one. This guarantees that all subtrees remain balanced.

This was only a brief description of the algorithm but there are some excellent explanations on the Internet. Here is a nice one because it is visual and interactive: visual algo

**The Impure Implementation**

Now let’s see some code! The impure implementation:

[code language=”scala”]

import scala.annotation.tailrec

class IUnionFind(val size: Int) {

private case class Node(var parent: Option[Int], var treeSize: Int)

private val nodes = Array.fill[Node](size)(new Node(None, 1))

def union(t1: Int, t2: Int): IUnionFind = {

if (t1 == t2) return this

val root1 = root(t1)

val root2 = root(t2)

if (root1 == root2) return this

val node1 = nodes(root1)

val node2 = nodes(root2)

if (node1.treeSize < node2.treeSize) {

node1.parent = Some(t2)

node2.treeSize += node1.treeSize

} else {

node2.parent = Some(t1)

node1.treeSize += node2.treeSize

}

this

}

def connected(t1: Int, t2: Int): Boolean = t1 == t2 || root(t1) == root(t2)

@tailrec

private def root(t: Int): Int = nodes(t).parent match {

case None => t

case Some(p) => root(p)

}

}

[/code]

As you can see I used an array of Nodes to represent the connected components. Most textbook implementations use two integer arrays: one for the parents of every element, and the other one for the tree sizes of the components to which the elements belong. Memory wise that is a more efficient implementation than mine. But apart from that the concept of the algorithm stays the same and in terms of speed the difference doesn’t matter much. I do think that using Node objects is more readable than having two integer arrays, so I chose for the Nodes.

**The purely functional implementation**

[code language=”scala”]

import scala.annotation.tailrec

case class Node(parent: Option[Int], treeSize: Int)

object FUnionFind {

def create(size: Int): FUnionFind = {

val nodes = Vector.fill(size)(Node(None, 1))

new FUnionFind(nodes)

}

}

class FUnionFind(nodes: Vector[Node]) {

def union(t1: Int, t2: Int): FUnionFind = {

if (t1 == t2) return this

val root1 = root(t1)

val root2 = root(t2)

if (root1 == root2) return this

val node1 = nodes(root1)

val node2 = nodes(root2)

val newTreeSize = node1.treeSize + node2.treeSize

val (newNode1, newNode2) =

if (node1.treeSize < node2.treeSize) {

val newNode1 = Node(Some(t2), newTreeSize)

val newNode2 = Node(node2.parent, newTreeSize)

(newNode1, newNode2)

} else {

val newNode2 = FNode(Some(t1), newTreeSize)

val newNode1 = FNode(node1.parent, newTreeSize)

(newNode1, newNode2)

}

val newNodes = nodes.updated(root1, newNode1).updated(root2, newNode2)

new FUnionFind(newNodes)

}

def connected(t1: Int, t2: Int): Boolean = t1 == t2 || root(t1) == root(t2)

@tailrec

private def root(t: Int): Int = nodes(t).parent match {

case None => t

case Some(p) => root(p)

}

}

[/code]

Comparing to the first implementation, some parts remained the same. Such as the Node, except for the fact that it is not an inner class anymore. Also the connected and the root methods did not change.

What did change is the method which deals with updating the connections: union. In the purely functional implementation we can’t update any array, so instead it creates a new FUnionFind object and returns it at the end. Also two Node objects need to be created when subtrees are merged; the root of the smaller one because it gets a new parent, and the root of the larger one because its tree size needs to be increased.

Perhaps surprisingly, the pure implementation needs more lines of code than the impure one.

**The Performance**

The pure implementation has to do a bit of extra work when it creates the new objects in its union method. The question is how much this costs in terms of performance.

To find this out, I ran both implementations through a series of performance tests (using ScalaMeter) where I added a large number of connections to a set of objects. I added a (impure) Java 8 implementation to the test as well.

Here are the results:

Nr of elements and connections | Impure | Pure | Java 8 |
---|---|---|---|

10000 | 2.2 s | 3.8 s | 2.3 s |

15000 | 4.4 s | 7.9 s | 4.2 s |

20000 | 6.2 s | 10.3 s | 6.3 s |

Not surprisingly, the time grows with the number of connections and elements. The growth is a bit faster than linear, that’s because the asymptotic time complexity of the quick union algorithm is of the order n log(n).

The pure algorithm is about 65% slower than the impure implementation. The cause is clear: in every call to union the pure algorithm has to allocate and garbage collect three extra objects.

For completeness I added Java 8 to the test too. The code is not given here but if you’re interested, there’s a link to the complete source below the article. Its implementation is really similar to the Scala version.

**Conclusion**

Purely functional code has a couple of advantages over non pure implementations; because of the lack of side effects it can make it easier to reason about blocks of code, also concurrency becomes easier because there is no shared state.

In general it also leads to more concise code because collection methods like map and filter can easily be used. But in this example that was not the case, the pure implementation even needed a few lines extra.

The biggest disadvantage of the pure union-find algorithm was its performance. It depends on the requirements of the project where the code is used if this is a showstopper, or if the better concurrent behavior of the pure implementation outweighs this disadvantage.

Explanation of a faster union-find with path compression

All the source code in the article, including the tests

## Comments

You mention your pure implementation ‘surprisingly’ takes more lines of code compared to the mutable original. I think this is due to the fact that this is mostly a one-to-one adaptation of the original algorithm, but avoiding mutation.

I came up with a (perhaps?) more ‘idiomatic’ adaptation in https://github.com/misja-alma/unionfind/pull/1 . This deviates from the original algorithm, but (I think) keeps intact the spirit of the approach.

I couldn’t quite make sense of your benchmarks, but assume that this implementation will still be slower than the mutable one. Summarizing the performance in a couple of (average?) values seems rather simplistic to me, though. Those numbers are almost certainly not applicable to a practical situation you might encounter: there are assumptions about the number of elements, the number of connections, whether the algorithm will be run in parallel, whether you want to optimize for latency or throughput, etc etc.

Still, interesting challenge :).

Thanks for the alternative, more functional algorithm! I merged your pull request.

The reason that I kept the pure algorithm implementation close to the mutable original, is that this way I could compare both their performances without having to take too much care about the implementation of the benchmarks. Since both algorithm implementations follow the same concept I could stick with a simple benchmark which just adds a large number of nodes and the same number of connections, and still get a meaningful outcome from the comparison.

The benchmark becomes flawed when implementations are added which use a different algorithm, such as the one you committed. In that case you’d like to have more variations in the scenario, such as having many connected groups, or just a couple of huge groups, etc. And you would want to measure the performance of the ‘connected’ method as well.

You are correct that each practical scenario demands its own set of benchmarks. For instance, the pure functional implementation would be easier to parallelize. The question remains of course if that is something that your application requires.