Easy Foundations for Programming Languages I — Introduction

Programming language theory (PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages.

In some ways, the history of programming language theory predates even the development of programming languages themselves. The lambda calculus, developed by Alonzo Church and Stephen Cole Kleene in the 1930s, is considered by some to be the world’s first programming language, even though it was intended to model computation rather than being a means for programmers to describe algorithms to a computer system. Many modern functional programming languages have been described as providing a “thin veneer” over the lambda calculus, and many are easily described in terms of it.

There are two generally recommended introductory textbooks in this field: one is the Foundations for Programming Languages by John C. Mitchell; another book is the Types and Programming Languages by Benjamin C. Pierce, which is not that comprehensive but more readable. However, either book has a fairly high level of difficulty for beginners. Confronting nearly a thousand pages filled of abstract definitions and obscure proofs, it is very likely for someone exploring on her own to quickly get lost in the sea of definitions that come and go, and lose the grasp of the totality of this field.

To help beginners comfortably comprehend fundamental concepts in the programming language field, this series of articles will follow the content in the Foundations for Programming Languages, and try to use the most readable sentences and clear logic to outline the mountains and rivers of this realm. The author will carefully select the most insightful knowledge to present and provide sufficient examples to ensure that the reader can follow the text with relative ease. Believe you can have a joyful experience when go back to classic textbooks after reading through this series.

Without further ado, let our journey start!

In the first article of this series, we are going to introduce our powerful modeling tool — lambda calculus — and give an entry-level explanation of its syntax and semantics. At the end of this article, we will briefly discuss types and type systems.

It is recommended to read my blog Programming Language Pragmatics to learn some background of programming languages before this series.

Model Programming Languages

The formulation of “model” programming languages is the beginning of a mathematical analysis. The word “model” here means that the language we want to analyze is a simplified language, which discards all irrelevant details of the original large language. We can call such a simplified version as a sublanguage of the original one. This process is quite useful in both analyzing existing languages and designing a new language.

Of course, some good properties we find on the sublanguage may not hold on the original language. Having a good taste and careful judgment is important for the “modeling”. In this series, we will go through fundamental programming language concepts using the framework of typed lambda calculus, which allows us to see beyond the surface syntax and to understand the meanings of program phrases (expressions, commands, declarations, etc.) at an appropriate level of detail.

This “appropriate” description is intriguing, since there is no single “appropriate level of detail” that is appropriate for all forms of programming and program analysis. So we concentrate on general techniques that may be applied in a variety of ways.

Lambda calculus is a system for reasoning about lambda expressions (whose definition will be given in the next section). It was first introduced by mathematician Alonzo Church in the 1930s as part of an investigation into the foundations of mathematics, a study of the philosophical and logical. The lambda calculus provides simple semantics for computation which are useful for formally studying properties of computation.

To date, lambda calculus has proven useful in describing, analyzing, and implementing programming languages. With little practice, the reader may become familiar enough with the notation to see C, OCaml, Pascal program phrases as syntactic variants of lambda expressions. This will make the theory described in this series more useful, and also make it much easier to understand a variety of programming languages.

First, let’s look at the lambda notation, which is used to express the syntax of lambda calculus.

Lambda Notation

Lambda notation has two primary features: lambda abstraction and application. In the next two subsections, we will explain their details.

Lambda Abstraction

Lambda abstraction is what we use to write function expressions. Expressions written in lambda notation are called lambda expressions or lambda terms. A simple example of the lambda expression is below.

\[\lambda x: nat. x\]

This is an identity function on natural numbers. Simply speaking, it represents a mathatical function $f(x)=x$. The notation “$x:nat$” specifies that the domain of this function is $\boldsymbol{nat}$, the type of natural numbers. In other words, this function only accepts a positive integer as the input, and returns this value directly, so the function range is also $nat$.

Writing the lambda expression in the $f(x)=x$ notation is helpful for us to understand, but there are two details to notice:
This notation doesn’t provide any constraint on the type of input;
The notation forces us to make up a name (here, $f$) for every function we write, while lambda notation gives us a succinct way of defining functions directly without a name. ($\lambda$ is just a binding operator.)

More generally, we can write a lambda expression in the following format:

\[\lambda x: \sigma.\: M\]

, where $M$ is some expression acting as the function body. This expression defines the function mapping any $x$ in $\sigma$ type to the value given by $M$.

The lambda calculus we introduced here is typed lambda calculus. There is another kind of pure lambda calculus, which does not have type constraint, whose expression is written as $\lambda x. M$. Further discussion about types is at the end of this article.

Lambda Application

Application allows us to make use of the functions we defined during the abstraction. In lambda notation, we write function application just by putting a function expression next to one or more arguments.

For example, we apply the natural-number identity function to the number 3 by writing the following.

\[(\lambda x: nat.\: x) \: 3\]

This function will return 3 as the output, which gives us such an equation.

\[(\lambda x: nat.\: x) \: 3 = 3\]

There are two conventions in writing lambda notations:

Application associates to the left, so that $MNP$ should be read as $(MN)P$. In words, we can read $MNP$ as the expression, “apply the function $M$ to argument $N$, and then apply the resulting function to argument $P$.” (In short, apply from left to right.)
The scope of each lambda is interpreted as being as large as possible. For example, we read $\lambda x: \sigma. MN$ as $\lambda x: \sigma. (MN)$, rather than $(\lambda x: \sigma. M)N$. (In short, extend the function body as far as you can.)

The two conventions work well together. For example, a multi-argument function application may be written

\[(\lambda x:\sigma .\: \lambda y:\tau .\: \lambda z:\rho . M) \: N \: P \: Q .\]

According to the two conventions, this expression is fully parenthesized as

\[(((\lambda x:\sigma . (\lambda y:\tau . (\lambda z:\rho . M))) \: N) \: P) \: Q .\]

These conventions make lambda expressions easier to write and read, but they are hard to understand at first. You need some practice to get used to it.

Example

Following is a snippet of the C program, which defines a function to add two integers together, and then invokes this function using parameters 3 and 5.

  
int add(int x, int y) {
    return x + y;
}
add(3, 5);

Using the lambda notation, we can use the following expression to model the behavior of this program.

\[(\lambda x:int .\: \lambda y:int .\:x+y) \: 3 \: 5\]

Scope

A final point we must address about the syntax of the lambda-calculus is the scopes of variables.

An occurrence of the variable $x$ is said to be bound when it occurs in the body $M$ of an abstraction $\lambda x: \sigma.\: M$. Equivalently, we can say $\lambda x$ is a binder whose scope is $M$. Otherwise, an occurrence of $x$ is said to be free.

For example, the occurrences of $x$ in $x \: y$ and $\lambda y: \sigma.\: x\: y$ are free, while the one in $\lambda x: \sigma.\: x$ is bound. In $(\lambda x: \sigma.\: x) \:x$, there are two occurrences of $x$ (not including the $x$ in the binder), the first of which is bound and the second is free.

Expressions that differ only in the names of bound variables are called $\boldsymbol{\alpha}$-equivalent; we write $M =_{\alpha} N$ if $M$ and $N$ are $\alpha$-equivalent. Following are some examples.

\[\begin{split} \lambda x: \sigma.\: x &=_{\alpha} \lambda y: \sigma.\: y \\ (\lambda x: \sigma.\: x)\: x &=_{\alpha} (\lambda y: \sigma.\: y)\:x \\ (\lambda x: \sigma.\: x)\: x &\neq_{\alpha} (\lambda y: \sigma.\: y)\:y \end{split}\]

Bound variables serve like “placeholders”. So, changing their names will not affect the meaning of any expression. Just like $f(x) = x^2$ and $f(y) = y^2$ represents the same quadratic function.

We write $FV(M)$ for the set of free variables of $M$. An expression is closed if it has no free variables. The simplest closed expression is the identity function

\[\lambda x: \sigma.\: x;\]

does nothing but return its argument.

Equations, Reduction, and Semantics

In addition to syntax (lambda notation), semantics is also an important part of lambda calculus. Simply speaking, semantics studies the meaning of program phrases in a language.

There are three kinds of semantics in lambda calculus: axiomatic semantics, operational semantics, and denotational semantics. A logician might call the first two “proof systems” and the third a “model theory”. Following list presents the summarized description for each of them.

Axiomatic semantics works by deriving equations between expressions.
Operational semantics is based on a directed form of equation reasoning called reduction (a form of symbolic evaluation).
Denotational semantics tries to describe language using model theory.

Now, we are gonna check them out one by one in subsequent sections.

Axiomatic Semantics

In the equational axiom system, we have an axiom for renaming bound variables through substitution, and an axiom relating function application to substitution.

To formally describe the substitution, we use the notation $[N/x]M$ for the result of substituting expression $N$ for variables $x$ in $M$. In this process, free variables should not become bound, which can be easily ensured by properly renaming some bound variables before the substitution.

For example, the result of $[N/y](x \: y \: z)$ is $x\:N\: z$.

Using substitution, the axiom for renaming bound variables is written

\[\lambda x: \sigma.\: M = \lambda y: \sigma.\: [y/x]M, \quad y \text{ not free in } M \tag*{($\alpha$)}\]

For example, we have $\lambda x: \sigma.\: x = \lambda y: \sigma.\: y$.

Since the expression $\lambda x: \sigma.\: M$ defines the function with value $M$ on argument $x$, we can compute the value at argument $N$ by substituting $N$ for $x$. For example, the result of applying function $\lambda x: nat.\: x + 5$ to argument $3$ is

\[(\lambda x: nat.\: x + 5)\:3 = [3/x](x+5) = 3+5 .\]

More generally, we have the equational axiom

\[(\lambda x: \sigma.\: M)\:N = [N/x]M, \tag*{$(\beta)_{eq}$}\]

called $\boldsymbol{\beta}$-equivalence.

Essentially, $\beta$-equivalence says that we can evaluate a function application by substituting the actual argument for the formal parameter within the function body.

Operational Semantics

Operational semantics is based on a series of reduction rules. The reduction rule is a directed form of the equational reasoning in axiomatic semantics.

Intuitively, the basic reduction rules describe single computation steps that can be repeated to evaluate an expression symbolically. This symbolic “evaluation procedure”, or “interpreter,” gives lambda calculus its computational character.

Although $\beta$-equivalence is an equation, we often tend to read it as a simplification rule, from left to right. For example, the equation $(\lambda x: nat.\: x + 5)\:3 = 3+5$ suggests that we can simplify the function application to $3+5$.

Since reduction is asymmetric (one-way direction), an arrow $\rightarrow$ is commonly used for one-step reduction, and a double-headed arrow $\twoheadrightarrow$ for zero or more reduction steps.

The central reduction rule is a directed version of $\beta$-equivalence called $\boldsymbol{\beta}$-reduction, written

\[(\lambda x: \sigma.\: M)\:N \xrightarrow{\beta} [N/x]M. \tag*{$(\beta)_{red}$}\]

Use the expression $(\lambda x:int .\: \lambda y:int .\:x+y) \: 3 \: 5$ in the previous example to illutrate the reduction process. This expression can be reduced in such a way:

\[\begin{split} (\lambda x:int .\: \lambda y:int .\:x+y) \: 3 \: 5 &\xrightarrow{\beta} [3/x](\lambda y:int .\:x+y) \: 5 = (\lambda y:int .\:3+y) \: 5\\ &\xrightarrow{\beta} [5/y](3+y) = 3 + 5, \end{split}\]

which can be also written as

\[(\lambda x:int .\: \lambda y:int .\:x+y) \: 3 \: 5 \twoheadrightarrow 3 + 5.\]

There may be more than one way to begin reducing an expression. In fact, you are free to choose which parts you want to start reducing first. Hence, $\twoheadrightarrow$ might be viewed as “nondeterministic” model of execution. For example, the following expression has two ways to reduce:
$(\lambda x:\sigma.\: M)((\lambda y:\tau.\: N)\:P) \twoheadrightarrow (\lambda x:\sigma.\: M)([P/y]N) \twoheadrightarrow [([P/y]N)/x]M$
$(\lambda x:\sigma.\: M)((\lambda y:\tau.\: N)\:P) \twoheadrightarrow [((\lambda y:\tau.\: N)\:P)/x]M \twoheadrightarrow [([P/y]N)/x]M$
However, choosing a different place to begin does not keep us from reaching the same final result. As we shall see, this is true in general about many forms of lambda calculus.

There are two notable properties of reduction on pure typed lambda calculus.

Church-Rosser property (confluence): if $M \twoheadrightarrow N$ and $M \twoheadrightarrow N'$, then there exists an expression $P$, such that $N \twoheadrightarrow P$ and $N' \twoheadrightarrow P$. This property states that the ordering in which the reductions are chosen does not make a difference to the eventual result.
Strong normalization property (termination): no matter how we try to reduce an expression, we cannot go on applying single-step reductions indefinitely. We eventually produce a normal form, an expression that cannot be simplified further.

The termination property will be destroyed once we add recursion to typed lambda calculus in later articles, because recursion makes it possible to write nonterminating algorithms.

Denotational Semantics

In the denotational semantics of typed lambda calculus, each type expression is associated with a set, called the set of values of this type. For example, the $nat$ type is associated with the natural numbers set $\mathbb{N}$.

A term of type $\delta$ is interpreted as an element of the set of values of type $\delta$. The set of values of type $\delta \rightarrow \tau$ is a set of functions, so that a typed lambda term $\lambda x: \delta. \: M$ is interpreted as a mathematical function.

While the semantics of pure typed lambda calculus seems simple, the semantics will become considerably more complex when more extensions are added! One example extension is the definitions of recursive function, which are computationally important but difficult to accommodate in classical set theory.

Types and Type System

Speaking at a high level, a type system is a logical system comprising a set of rules that assigns a property called a type (such as integer, floating point, and string) to every term (a word, phrase, or other set of symbols).

In any type system, types provide a division or classification of some universe of possible values: a type is a collection of values that share some property. Therefore, it always makes sense to ask what the elements of a type are.

Throughout this series, we distinguish between types, which are collections, and values, which are members of types. For example, $3$ is a value of type $nat$.

In some systems, there may be types with types as memebers, which will be usually called something else, such as universe, orders, or kinds to avoid the impression of circularity.

In most programming languages, types are “checked” in some way, either during compilation, or during execution. In compiler-time checking, we attempt to guarantee that each program phrases (expressions, declarations, etc.) defines an element of a type that is either specified explicitly as part of the program text, or determined implicitly by the way the program phrase is used. Run-time type checking performs similarly, but postpones the test until the program is executed.

An important aspect of adopting a typed theory of programming languages instead of an untyped one is that untyped languages arise naturally as a special case of typed languages. The main idea is that we could use a type $untyped$ for all the untyped expressions of a language. Suppose we have a typed language with integer type $int$, float point type $float$, and boolean type $bool$, we can construct the associated $untyped$ type using the following recursive type definition:

\[untyped ::= int \mid float \mid bool\]

This kind of type definition notation will be introduced in a later article. Intuitively, the type $untyped$ is defined as the union of the integers, float points, and booleans.