DISTILLED SUMMARY
Proofs Are Going to Get Cheap
Speaker: Mike (Principal, Galois) | Venue: UW | Date: 2026
The Thesis
AI is on track to make formal mathematical proof radically cheaper. Proof has always been
expensive because humans had to produce it; AI capability in mathematics and in Lean
theorem proving is improving exponentially and shows no signs of slowing. If this trajectory
holds, proof shifts from a luxury reserved for critical security infrastructure to a broadly
applicable verification technology.
Why It Matters
Cheap proof enables a new trust architecture for AI systems: formal oversight. AI produces
solutions through opaque processes; a formal proof of correctness, checked by Lean's kernel,
provides independent verification without trusting the AI itself. This pattern scales with AI
capability — as AI outputs grow harder for humans to review, the same AI improvements
make proof generation easier. The seL4 microkernel required 165,000 lines of proof for
8,500 lines of C; AI-assisted proof could make such ratios economically viable for far more
systems.
The Evidence
AI mathematical performance has improved exponentially across benchmarks. On Frontier
Math (closed, graduate-level problems), scores rose from near-zero in mid-2024 to 45–50%
by 2026. AI Lean theorem proving has gone from poor to respectable within nine months.
Multiple companies have built business models around AI mathematical solvers targeting
Lean. Capability jumps can be sudden: Anthropic's Mythos model showed a ~100×
improvement over its predecessor on one security benchmark.
Three Unsolved Problems
World modeling: Formal verification requires translating real code into formal models. The
translation must correctly represent language semantics, hardware, OS, and libraries. A
buffer overflow was found in the Lean runtime itself this week. Wherever the verification
boundary is drawn, bugs will exist just outside it.
Specification: The deepest problem. Formal oversight only works if the specification
correctly captures what "correct" means — and most real systems lack clean formal
specifications. The AI cannot write its own spec without grading its own homework.
Humans traditionally refine specs by working on proofs; if AI handles proofs, that feedback
loop breaks. This is the challenge least likely to be solved by AI capability improvements
alone.
Bottom line: Proof generation may soon be automated. Specification — formally
stating what we actually want — remains a hard, human-shaped problem. The formal
methods community should build Lean infrastructure now and invest heavily in
specification technology, the problem with the longest shelf life and the greatest
unsolved need.