Blog

Real world functional programming in Scala – comparing Java, Clojure and Scala

04 Jul, 2009
Xebia Background Header Wave

I recently started reading Stuart Halloway’s book ‘Programming in Clojure’. I don’t think I will be writing much enterprise applications in that language in the near future, but it never hurts to broaden the mind, and it’s a very good read. In his book, he demonstrates some of the advantages of functional programming by taking an example from the Apache commons library: StringUtils.indexOfAny. He has also written a blog about it.
In this blog post, we’ll compare the original function in Java, the Clojure version and a Scala implementation.

To start: here’s the original implementation in Java:
[java]
// From Apache Commons Lang, https://commons.apache.org/lang/
public static int indexOfAny(String str, char[] searchChars) {
if (isEmpty(str) || ArrayUtils.isEmpty(searchChars)) {
return -1;
}
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
for (int j = 0; j < searchChars.length; j++) {
if (searchChars[j] == ch) {
return i;
}
}
}
return -1;
}
[/java]
Sample results of this function are:
[java]
StringUtils.indexOfAny(null, ) = -1
StringUtils.indexOfAny("" ,
) = -1
StringUtils.indexOfAny(, null) = -1
StringUtils.indexOfAny(
, []) = -1
StringUtils.indexOfAny("zzabyycdxx" ,[‘z’ ,’a’ ]) = 0
StringUtils.indexOfAny("zzabyycdxx" ,[‘b’ ,’y’ ]) = 3
StringUtils.indexOfAny("aba" , [‘z’ ]) = -1
[/java]
And now the Clojure code. I’ve taken the version from the book, (free to download here). The book version defines three functions, instead of the version shown in the blog post.


(defn indexed [s] (map vector (iterate inc 0) s))
(defn index-filter [pred coll]
  (when pred
    (for [[idx elt] (indexed coll) :when (pred elt)] idx)))
(defn index-of-any [pred coll]
  (first (index-filter pred coll)))

The three functions defined are:

  1. indexed returns a sequence of pairs, the first element being the index, the sequence the element.
  2. index-filter returns a sequence of the indexes of matching predicates in the collection. For example, the function call (index-filter #{\a \b} “abcdbbb”) yields (0 1 4 5 6)
  3. index-of-any is the StringUtils function. This is now a simple function, just returning the first result of the index-filter function.

As Stuart has mentioned in his book, the Clojure version is shorter, less complex, and also more general than the Java version. No special case handling, such as null or empty search character checking. This all just works naturally in the functional version. It can also be easily extended for other functions than just indexOfAny
Since I’ve been implementing all kinds of nice, but useless little functions in Scala recently, I thought it might be a good idea to try out a more useful example. So let’s see how well we can do with Scala in implementing this in a similar way as the Clojure version.
Almost immediately, we hit on the dreaded null problem. Scala is not a pure functional language, and you have to beware that parameters that are passed on can be null. You could specify the input arguments as Options, of course, but than the caller of the function still needs to construct either the Some or None value, and to do that the caller would have to check for it being null or not.
So, after spending some time grumbling about this issue, I decided to admit defeat and do the null check. 1-0 for Clojure.
Without further ado, here are the three functions implemented in Scala (N.B. I’ve used the trunk here, in case you might want to try this out yourself, I haven’t tried whether this also works version 2.7.5):
[scala]
def indexed(coll:Seq[Any]):Seq[Pair[Int,Any]] = {
if (coll==null) List()
(0 until coll.length).zip(coll)
//below an alternative, but here we have to traverse the list another time to reverse the elements…
//coll.zipWithIndex.map(p => p.swap)
}
def indexedFilter(pred:Seq[Any], coll:Seq[Any]):Seq[Int] = {
if (pred==null)List()
val idxList = indexed(coll)
for (p <- pred; idxPair <- idxList; if (p == idxPair._2)) yield (idxPair._1)
}
def indexOfAny(pred:Seq[Any], coll:Seq[Any]):Option[Int] = {
val res=indexedFilter(pred, coll)
if (res.isEmpty) None
else Some(res.first)
}
[/scala]
Following the definition of the indexed function is a no-brainer: Scala’s zip function on the List class does the job here. There’s also zipWithIndex which does almost what we want, this returns a List of tuples, but in reverse order as the Clojure sample: element first, index later.
The indexedFilter uses a for-comprehension just as the Clojure example (Clojure’s for is not a traditional loop, but a sequence comprehension). For each element in our predicate sequence it checks whether it is present in the indexed tuples that is the result of the indexed function. If it is, it is added to the resulting list using yield. indexOfAny then becomes a trivial function that just checks whether the resulting sequence is empty or not. Instead of returning -1 if nothing is found, I’ve used Scala’s Option class in this case to make the result more expressive.
The function at least works as expected:

scala> indexOfAny("ab", "cdef")
res3: Option[Int] = None
scala> indexOfAny("ab", "bcdefbbb")
res3: Option[Int] = None
scala> indexOfAny("fb", "bcdefbbb")
res4: Option[Int] = Some(4)

Conclusion
All in all, the scala version is not that bad. Its main drawback over the Clojure version is the two null checks that we need to do, and also we need to check the resulting list being empty or not in the indexOfAny function. Clojure wins in this respect. However, the Scala version still is pretty neat and generic and improves (in my opinion at least) the original Java version.
Update: handling infinite collections
Stuart asked the question in his comment how the Scala version would handle infinite collections. The version presented above does not handle this very well. For instance, you’ll get a OEM if you try to pass on Stream.from(0) (which constructs a lazy infinite list of all natural numbers) to the indexed function. Also the indexedFilter function evaluates the entire collection, which is strictly not necessary for the indexOfAny function. To handle an infinite collection, we need to modify and optimize the functions somewhat, to resemble a bit more the functions that Stuart has used in his blog post, instead of the book versions. The Clojure functions used in the blog post look like this:


(defn indexed [s] (map vector (iterate inc 0) s))
(defn index-of-any [s chars]
  (some (fn [[idx char]] (if (get chars char) idx))
          (indexed s)))

The Scala version that I came up with is as follows:
[scala]
def lazyIndexed(coll:Seq[Any]):Seq[Pair[Int,Any]] = {
if (coll==null) Stream.empty
Stream.from(0).zip(coll)
}
def lazyIndexOfAny(pred:Seq[Any], coll:Seq[Any]):Option[Int] = {
if (pred==null) None
lazyIndexed(coll).dropWhile(ip => !pred.exists(p => (p == ip._2))).headOption.map(ip => ip._1)
}
[/scala]
As said, Scala’s Stream implements a lazy list, in which the elements are only evaluated when needed. In this case, we keep on drop elements from it list that do not appear in the predicate sequence. The head of the remaining list is the first matching result.
Obvious drawback here is that this indeed does works on infinite collections, but only if a match can be found, else we’re still in trouble.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts