Blog

Matching strings in Scala

02 Jan, 2017
Xebia Background Header Wave

Over December I had a lot of fun doing the Advent of Code coding challenges with some colleagues.
Many of those, such as day 21, require interpreting some kind of string input. While normally I’d probably marshall those strings into case classes before processing, in this case that seemed like overkill: a quick pattern-match should be sufficient.
It turns out there’s a couple of ways to approach that, which is also a good excuse to look under the hood and see which Scala concepts they’re built on.

Pattern-matching with regexes

Throughout this post we’ll use matching the "swap position X with position Y" command as an example. We can create a regex for this command, and then use it to match the command:
[code language="scala"]
val SwapPositions = "swap position (\d+) with position (\d+)".r
def applyCommand(in: String, command: String): String =
command match {
case SwapPositions(x, y) =>
in.updated(x.toInt, in(y.toInt)).updated(y.toInt, in(x.toInt))

}
[/code]

Extractor patterns: unapply

If you so far only pattern-matched on case classes, you might be surprised we can pattern-match on a scala.util.matching.Regex here. This is called the Extractor Pattern: when you use a reference to an object with an unapply (or unapplySeq) method in a pattern-match, it will:

  • Pass the object that is to be matched to this method
  • If the unapply method returns None, the pattern does not match
  • If it returns values, they will be passed on for further matching or binding

Indeed scala.util.matching.Regex has an unapplySeq function, so it can be used as an extractor.

Extractors for conversion

In our naive implementation above, notice we have to call .toInt on the matched integers repeatedly. We can leverage the Extractor Pattern to do this conversion while matching:
[code language="scala"]
import scala.util.Try
object ToInt {
def unapply(in: String): Option[Int] = Try(in.toInt).toOption
}
val SwapPositions = "swap position (\d+) with position (\d+)".r
def applyCommand(in: String, command: String): String =
command match {
case SwapPositions(ToInt(x), ToInt(y)) =>
in.updated(x, in(y)).updated(y, in(x))

}
[/code]

Extractors need a ‘Stable Id’

You might have noticed the actual regex and the match are not in the same place. In larger applications that might be an advantage, but in this case it would be nicer to have a one-liner like this:
[code language="scala"]
def applyCommand(in: String, command: String): String =
command match {
case "swap position (\d+) with position (\d+)".r(x, y) =>
in.updated(x.toInt, in(y.toInt)).updated(y.toInt, in(x.toInt))

}
[/code]
Unfortunately the above does not work, as the Extractor Pattern syntax is defined as:
[code]
SimplePattern ::= StableId ‘(’ [Patterns] ‘)’
[/code]
As "xxx".r is not a StableId, we cannot use it inline here.

String interpolation

A trick we can use here is string interpolation. You might have seen strings prefixed with s have special meaning in Scala, but you can actually define your own like this:
[code language="scala"]
implicit class RegexHelper(val sc: StringContext) extends AnyVal {
def re: scala.util.matching.Regex = sc.parts.mkString.r
}
def applyCommand(in: String, command: String): String =
command match {
case re"swap position \d+ with position \d+" =>
???

}
[/code]
Now this correctly matches the string, but doesn’t capture the groups we defined. We can achieve this by adding expressions to the interpolated string:
[code language="scala"]
implicit class RegexHelper(val sc: StringContext) extends AnyVal {
def re: scala.util.matching.Regex = sc.parts.mkString.r
}
def applyCommand(in: String, command: String): String =
command match {
case re"swap position (\d+)$x with position (\d+)$y" =>
in.updated(x.toInt, in(y.toInt)).updated(y.toInt, in(x.toInt))

}
[/code]
This works because re"swap position (\d+)$x with position (\d+)$y" is desugared to:
[code language="scala"]
StringContext("swap position (\d+)", "with position (\d+)", "").re (x, y)
[/code]
Note that this reveals the location of the variables in the string does not actually matter: putting them near the matching groups is purely a matter of convenience/readability.
We can even to further matching in those expressions, allowing:
[code language="scala"]
import scala.util.Try
object ToInt {
def unapply(in: String): Option[Int] = Try(in.toInt).toOption
}
implicit class RegexHelper(val sc: StringContext) extends AnyVal {
def re: scala.util.matching.Regex = sc.parts.mkString.r
}
def applyCommand(in: String, command: String): String =
command match {
case re"swap position (\d+)${ToInt(x)} with position (\d+)${ToInt(y)}" =>
in.updated(x, in(y)).updated(y, in(x))

}
[/code]

Less general patterns

If we don’t really need the full power of regular expressions to match our strings, we can also simply match the ‘holes’ in the interpolated string with ., yielding:
[code language="scala"]
import scala.util.Try
object ToInt {
def unapply(in: String): Option[Int] = Try(in.toInt).toOption
}
implicit class RegexHelper(val sc: StringContext) extends AnyVal {
def re: scala.util.matching.Regex =
sc.parts
.map(java.util.regex.Pattern.quote)
.reduce(_ + "(.
)" + _)
.mkString
.r
}
def applyCommand(in: String, command: String): String =
command match {
case re"swap position ${ToInt(x)} with position ${ToInt(y)}" =>
in.updated(x, in(y)).updated(y, in(x))

}
[/code]
This is the approach shipped with Ammonite in ammonite.ops.

Performance

Because string interpolation is performed every time the match is evaluated, this is not the most efficient way to match strings. When you think your string parsing might be your application bottleneck, be sure to profile to see whether this solution is suitable for you – but beware of premature optimization.

Conclusion

By combining pattern-matching, string interpolation and regular expressions we have a powerful and succinct way to match strings in Scala.
While regular expressions can already look like ‘character soup’, and we make things a bit worse by adding inline interpolated expressions, for readability this should probably only be applied to relatively simple patterns.

References

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts