$$ \usepackage{amssymb} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\ZZ}{\ooalign{Z\cr\hidewidth\kern0.1em\raisebox{-0.5ex}{Z}\hidewidth\cr}} \newcommand{\colim}{\text{colim}} \newcommand{\weaktopo}{\tau_\text{weak}} \newcommand{\strongtopo}{\tau_\text{strong}} \newcommand{\normtopo}{\tau_\text{norm}} \newcommand{\green}[1]{\textcolor{ForestGreen}{#1}} \newcommand{\red}[1]{\textcolor{red}{#1}} \newcommand{\blue}[1]{\textcolor{blue}{#1}} \newcommand{\orange}[1]{\textcolor{orange}{#1}} \newcommand{\tr}{\text{tr}} \newcommand{\id}{\text{id}} \newcommand{\im}{\text{im}\>} \newcommand{\res}{\text{res}} \newcommand{\TopTwo}{\underline{\text{Top}^{(2)}}} \newcommand{\CW}[1]{\underline{#1\text{-CW}}} \newcommand{\ZZ}{% \ooalign{Z\cr\hidewidth\raisebox{-0.5ex}{Z}\hidewidth\cr}% } % specific for this document \newcommand{\cellOne}{\textcolor{green}{1}} \newcommand{\cellTwo}{\textcolor{red}{2}} \newcommand{\cellThree}{\textcolor{brown}{3}} \newcommand{\cellFour}{\textcolor{YellowOrange}{4}} $$

Automated theorem proving

automated theorem proving
formal verification

Luca Leon Happel


February 21, 2024


February 22, 2024


If you are a programmer, most likely you know that feeling you get when you work on your code for too long, that maybe, however unlikely, there is a bug in your code. You have tested it, you have written unit tests, but still, you are not sure. You are not sure if your code is correct. This is where formal verification comes in. Formal verification is the process of proving that a program is correct. This can be done in many ways, but one of the most interesting ways is automated theorem proving. In this article, I will provide a glimpse into automated theorem proving and show you how to use the Lean4 programming language to prove theorems. This is not really intended as an introduction to Lean4 or automated theorem proving, but rather as a demonstration of what I learned after half holding a seminar about it, attending half a year of a course about it at the university of Düsseldorf and particiating at the conference Lean for the Curious Mathematician 2023.

From functional programming to automated theorem proving

Lean4 is a functional programming language. This means that it is a programming language that is based on the \(\lambda\)-calculus. The \(\lambda\)-calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. In Lean4, we write programs by defining functions, composing them in clever ways and finally applying them to specific arguemnts. In this regard, Lean4 is very similar to Haskell, about which I held a talk a year ago.

By enriching the \(\lambda\)-calculus with a sufficiently strong type system like in Lean4, we can use the resulting programming language to write programs that are provably correct. The foundation of this is the Curry-Howard correspondence, whose motto is “propositions as types, proofs as terms”. In \(\lambda\)-calculus, terms are well-defined mathematical objects as are types.

A quick example of the power that a strong type system provides us with would be the following Lean4 code snippet:

def swap {α β : Type} (p : α × β) : β × α := (p.snd, p.fst)

theorem swap_swap {α β : Type} (p : α × β) : swap (swap p) = p :=
  match p with
  | (a, b) => rfl

Here we define a function swap that swaps the elements of a pair. We then prove that swap is involutive, i.e. that swap (swap p) = p for all pairs p. The theorem keyword is used to define a theorem, but is just syntactic sugar for defining a function that represents a proof. Notice that we state our theorem using Lean4’s type system and then provide a proof using Lean4’s programming language. The proof is verified by the Lean4 compiler, which checks if each step in the program has valid types. If we were to change the proof to the following:

theorem swap_swap {α β : Type} (p : α × β) : swap (swap p)  p :=
  match p with
  | (a, b) => rfl

The Lean4 compiler would throw the following error regarding the rlf function:

type mismatch
has type
  ?m.149 = ?m.149 : Prop
but is expected to have type
  swap (swap (a, b))  (a, b) : Prop

Notice that the error message is very helpful (if you know how to decode it) and it tells us exactly what went wrong. It tells us that rfl (the reflexivity axiom, which states that for all \(a\), we have \(a=a\)) has type ?m.149 = ?m.149 : Prop, but the term we provided has type swap (swap (a, b)) ≠ (a, b) : Prop. We did not get this error message previously, because previously we wanted to show that swap (swap p) = p, which we did by unfolding the definition of swap using the match keyword and then using the rfl function to show that (a,b) = (a,b). This last step is where everything went wrong in the second proof. After unpacking we get swap (swap (a, b)) = (a, b), and now by applying rfl we get a term (a,b) = (a,b), which is not what we wanted to show. We wanted to show that swap (swap (a, b)) ≠ (a, b), which is not true, and thus we get a type error.

Untyped \(\lambda\)-calculus

Let \(V = \{x,y,z, \dots\}\) be a set, which we call the set of variables and let \(K = \{\a,\b,\f,\g, \dots\}\) be a set, which we call the set of constants. We can now define the set of terms \(\Lambda\) inductively as follows:

Rule Name
If \(x \in V\), then \(x \in \Lambda\) Variable
If \(\f \in K\), then \(\f \in \Lambda\) Constant
If \(s,t \in \Lambda\), then \((st) \in \Lambda\) Application
If \(x \in V\) and \(t \in \Lambda\), then \((\lambda x.t) \in \Lambda\) Abstraction

Using the notation of the Backus–Naur form, we can define \(\Lambda\) succinctly as follows:

\[ \Lambda ::= V \mid K \mid \Lambda \Lambda \mid \lambda V.\Lambda \]

This strongly resembles the syntax of Lean4 and also that of Haskell.

Simply typed \(\lambda\)-calculus

In the simply typed \(\lambda\)-calculus, we enrich the untyped \(\lambda\)-calculus with a type system. We define the set of types \(\T\) inductively using a set of base types \(\T_0\) as follows:

  1. If \(A \in \T_0\), then \(A \in \T\)
  2. If \(A,B \in \T\), then \(A \to B \in \T\)

Using the notation of the Backus–Naur form again, we can define \(\T\) succinctly as follows:

\[ \T ::= \T_0 \mid \T \to \T \]

Using the set of types \(\T\), we can now define the set of typed terms \(\Lambda_\T\) inductively using the Backus–Naur form as follows:

\[ \begin{align*} \Lambda_\T &::= V \mid K \mid \Lambda_\T \Lambda_\T \mid \lambda V:\T.\Lambda_\T \\ & \text{where } \\ & V \text{ is the infinite set of variables } \\ & K \text{ is the set of constants } \\ \end{align*} \]

Type inference

Using typed \(\lambda\)-calculus as a basis for our programming language, we can now associate a type with each valid term. For example:

\[ \begin{align*} (+1) &= \lambda x : \N \to \N. x+1 & : \N \to \N \\ \text{hello} &= \text{IO.println} \, \text{"Hello, world!"} & : \text{IO} \, \text{Unit} \\ \f &= \lambda \g : \alpha \to \alpha. g \, g & : (\alpha \to \alpha) \to \alpha \to \alpha \end{align*} \]

Not all terms have a type, however. For example, assume \(\a, \b \in K\) are some constants. We say, that our context \(\Gamma\) consists of \(\a, \b : \kappa\), denoted as \(\Gamma, \, \a, \b : \kappa\), say in the context \(\Gamma\), it holds that \(\a, \b\) are of type \(\kappa\). Then the term \(\a \b\) does not have a type, because we cannot infer a type for \(\a\) and \(\b\), as we cannot apply \(\a\) to \(\b\).

Lean4 and other programming languages with sufficiently strong type systems can infer the type of a term. This is called type inference. For example, in Lean4, we can write the following code:

def f x := x + 1

This snippet defines a function f that takes an argument x and returns x + 1. We did not specify the type of x, but Lean4 can infer that x is of type , because we used 1, which is of type , in the expression x + 1 and if the second parameter of the + function is of type , then the first parameter must also be of type (although this is not true in general, but we do not care about this here).

Implicitly we used the context \(\Gamma\), such that 1:ℕ. To achieve a fully sound and rigorous type inference, we would have to specify the context \(\Gamma\) beforehand and we need to define some general rules for type inference. These rules are:

\[ \begin{align*} \text{Konst} & \frac{}{\Gamma \vdash \a : \kappa} & \text{if } \Gamma, \a : \kappa \\ \text{Var} & \frac{\Gamma, \, x : \tau \vdash t : \sigma}{\Gamma \vdash \lambda x : \tau.t : \tau \to \sigma} \\ \text{App} & \frac{\Gamma \vdash t : \sigma \to \tau \quad \Gamma \vdash s : \sigma}{\Gamma \vdash t \, s : \tau} \\ \text{Abs} & \frac{\Gamma, \, x : \tau \vdash t : \sigma}{\Gamma \vdash ( \lambda x : \tau.t ) : \tau \to \sigma} \\ \end{align*} \]

Type Inference Example

A simple example of type inference would be the following:

Consider the term \(\lambda x. x + 1\). To determine its type under the context \(\Gamma\), we use the rules of type inference as outlined above. Let’s break down the process step by step.

  1. Variable Rule (Var):

The term is an abstraction \(\lambda x.t\) where \(t = x + 1\). We assume \(x : \N\) as part of our context for the body \(t\) of the abstraction. This is because we are dealing with addition and the constant 1 is of type \(\N\).

  1. Application Rule (App):

The subterm \(x + 1\) is an application of the addition function to \(x\) and \(1\). In \(\Gamma\), the addition function has the type \(\N \to \N \to \N\). This means it takes two arguments of type \(\N\) and returns a result of type \(\N\). Since \(x : \N\) and \(1 : \N\), the term \(x + 1\) is well-typed and has the type \(\N\).

  1. Abstraction Rule (Abs):

We now return to our original term \(\lambda x. x + 1\). Since the body of the abstraction \(x + 1\) has type \(\N\) and we assumed \(x : \N\), the whole abstraction has the type \(\N \to \N\). Therefore, the term \(\lambda x. x + 1\) is well-typed under the context \(\Gamma\) and has the type \(\N \to \N\). This represents a function that takes a natural number as input and returns a natural number as output.

This results in the following derivation: