This article was originally published at 47deg.com on February 10, 2021.
Welcome to the final installment of our lambda calculus using JavaScript tour. It has been a fun ride, and we have seen how numbers, recursion, data types, everything can be encoded within the three restrictive rules we described in part 1. In this post, we are going to step back and write our own evaluator for lambda terms.
Lambda terms as values
Wait a minute! Weren’t we already representing lambda terms as JavaScript functions? Indeed, that’s what we were doing in the previous parts of the series: use the fact that JavaScript is a superset of lambda calculus to piggyback on an interpreter to execute our lambda terms. So at the end of this post, you’ll have an interpreter for lambda calculus running on top of an interpreter for JavaScript! You know, it’s turtles all the way down.
The first step is to stop using JavaScript expressions to represent our lambda terms. Instead, we are going to introduce a data structure that explicitly represent our lambda terms. The technical name for this is reification: we turn something that only existed implicitly that we couldn’t inspect into something we can manipulate within our language. For example, the application of the identity function to the number 3
is going to look as follows:
{
kind: 'app',
fun: identity,
arg: { kind: 'number', n: 3 }
}
More concretely, each lambda term has a kind
field that tells whether you are in front of a variable ('var'
), application ('app'
), or function ('fun'
). For convenience, we are also going to introduce numbers, even though we know that Church numerals can take their role. To make it easier to write out those expression, we’ll introduce constructor functions.
const vr = index => ({ kind: 'var', index })
const ap = (fun, arg) => ({ kind: 'app', fun, arg })
const nb = n => ({ kind: 'number', n })
If you are wondering why parentheses and braces, the reason is that otherwise JavaScript parses the braces as beginning a code block and
kind
a label, instead of an object literal and a key.
The application of identity to the number 3 can be then written out as:
ap(identity, nb(3))
Question for the reader: could you write a function that applies not one, but several arguments to a function? That way we do not have to write
ap(ap(f), x), y)
, but ratheraps(f, x, y)
.
Solution: we can iterate over the argument list, building a larger application at each step.
const aps = (fun, ...args) => args.reduce((acc, arg) => ap(acc, arg), fun)
The idea of using a host programming language – JavaScript in this case – to manipulate another programming language – lambda calculus terms in this case – is the starting point of metaprogramming. The best known example of metaprogramming is macros from the Lisp family of languages. Elixir also makes powerful use of macros, and Template Haskell is its realization in the context of Haskell.
Meet meneer de Bruijn
I’ve consciously “forgotten” to describe any example of how a function looks (identity
above hasn’t been defined); the reason is that doing so in a nice way is far from trivial. The most natural choice, copying what we would do in JavaScript, is to introduce an argument name and a body.
const fn = (arg, body) => ({ kind: 'fun', arg, body })
// this is the identity function
const identity = fn('x', vr('x'))
Unfortunately, this is a very problematic choice. With proper argument names comes shadowing. Imagine that we would write the following term:
const f = fn('x', fn('x', vr('x')))
What does the inner 'x'
refer to? Most languages have rules to deal with those cases. In the case of lambda calculus, we speak of α-conversion, which specifies that the term is equivalent to fn('x', fn('y', vr('y')))
but not to fn('x', fn('y', vr('x')))
.
Instead of solving a hard problem, we can choose another representation that makes this problem trivial. In the 1970s, Nicolaas Govert de Bruijn (the pronunciation of the surname is close to the English word “brown,” which is its literal meaning), a Dutch mathematician, came up with what we nowadays call de Bruijn indices. The idea is that, instead of using proper names, variables specify indices that describe how many “layers” we need to navigate to get to the one we want. Here are two examples that showcase this idea:
fn(fn(0)) // corresponds to a => b => b
fn(fn(1)) // corresponds to a => b => a
Starting with 0 as the innermost enclosing fn
, each additional index moves one fn
out. Note that, as a result of this encoding, we no longer give names to arguments. Some people have invented different representations that keep the good parts of de Bruijn indices, but are simple for human readability.
Finally, evaluation
We are now ready to write the function that takes one of these lambda terms and returns its evaluated form. The skeleton is a big switch
over the kind
property, which tells us how to handle each case. For example, in the case of numbers, we simply return it unchanged:
switch (expression.kind) {
case 'number':
return expression
case 'var':
...
case 'app':
...
case 'fun':
...
}
The subtle part here is the interplay between application and functions. We are going to use the most common evaluation strategy, in which to obtain the result of a function call, we first recursively evaluate the function itself and its argument. And if the function is of the 'fun'
kind, we have additional work to do. That corresponds to a JavaScript code of the form ‘(x => body) argument`.
case 'app':
const f = // evaluate expression.fun
const a = // evaluate expression.arg
switch (f.kind) {
case 'fun':
... // the important case
default:
return { kind: 'app', function: f, argument: a }
}
How does the body of the function know the value of its arguments? One possibility is to directly substitute the value in the code of the function body; but, once again, this is complicated and error-prone code. Instead, we are going to keep the value of those arguments in a list, which is usually called the context. When going into evaluating the body, we must remember to add the value of the argument in that list – those are the lines marked with !
.
const evaluate = (context, expression) => {
switch (expression.kind) {
... // other cases
case 'app':
const f = evaluate(context, expression.fun) // recur
const a = evaluate(context, expression.arg) // recur
switch (f.kind) {
case 'fun':
const newContext = [a, ...context] // !
return evaluate(newContext, f.body) // !
default:
return { kind: 'app', function: f, argument: a }
}
case 'fun':
return expression
}
}
The fact that a
has been added at the front of the context is not an arbitrary choice. By doing so, the list is ordered from innermost to outermost argument. As a result, we can use the de Bruijn index in a variable literally as an index for that list!
case 'var':
return context[expression.index]
The rest of the cases are left unchanged: if you have a function, we do not look inside the body. If your application does not have a function in front, we also stop evaluation. This corresponds to the notion of head normal form, which one usually hears in the context of laziness in Haskell.
Does this work?
Let’s open a Node.js REPL and load our file with the definition of the constructors and the evaluate
function.
> .load lambdaEval.js
// lots of things
undefined
We define the identity
function using our representation, and then ask to evaluate its application to the number 3. Since we start at top level, evaluation uses an empty context.
> const identity = fn(vr(0))
undefined
> evaluate([], ap(identity, nb(3)))
{ kind: 'number', n: 3 }
Victory! We can also blow the Node.js interpreter by asking it to evaluate the never-ending term Ω we introduced in part 3.
> const omega = fn(ap(vr(0), vr(0)))
> const Omega = ap(omega, omega)
> evaluate([], Omega)
Uncaught RangeError: Maximum call stack size exceeded
Conclusion
You are now the proud author of an interpreter! Compilers and interpreters usually look like mythical beasts, but they are simply programs that manipulate some representation of a program. And that representation looks no different than any other data you manipulate in your daily programming job.
My hope is that this series has shed some light on what lambda calculus is. Oftentimes, the mathematical presentation hides the fact that most programmers nowadays already know lambda calculus, but they don’t know that they know it.