Distilled (Software and Proof) — MISC TOPICS

DISTILLED SUMMARY
Proofs Are Going to Get Cheap
Speaker: Mike (Principal, Galois) | Venue: UW | Date: 2026
The Thesis
AI is on track to make formal mathematical proof radically cheaper. Proof has always been
expensive because humans had to produce it; AI capability in mathematics and in Lean
theorem proving is improving exponentially and shows no signs of slowing. If this trajectory
holds, proof shifts from a luxury reserved for critical security infrastructure to a broadly
applicable verification technology.
Why It Matters
Cheap proof enables a new trust architecture for AI systems: formal oversight. AI produces
solutions through opaque processes; a formal proof of correctness, checked by Lean's kernel,
provides independent verification without trusting the AI itself. This pattern scales with AI
capability — as AI outputs grow harder for humans to review, the same AI improvements
make proof generation easier. The seL4 microkernel required 165,000 lines of proof for
8,500 lines of C; AI-assisted proof could make such ratios economically viable for far more
systems.
The Evidence
AI mathematical performance has improved exponentially across benchmarks. On Frontier
Math (closed, graduate-level problems), scores rose from near-zero in mid-2024 to 45–50%
by 2026. AI Lean theorem proving has gone from poor to respectable within nine months.
Multiple companies have built business models around AI mathematical solvers targeting

Lean. Capability jumps can be sudden: Anthropic's Mythos model showed a ~100×
improvement over its predecessor on one security benchmark.
Three Unsolved Problems
World modeling: Formal verification requires translating real code into formal models. The
translation must correctly represent language semantics, hardware, OS, and libraries. A
buffer overflow was found in the Lean runtime itself this week. Wherever the verification
boundary is drawn, bugs will exist just outside it.
Specification: The deepest problem. Formal oversight only works if the specification
correctly captures what "correct" means — and most real systems lack clean formal
specifications. The AI cannot write its own spec without grading its own homework.
Humans traditionally refine specs by working on proofs; if AI handles proofs, that feedback
loop breaks. This is the challenge least likely to be solved by AI capability improvements
alone.
Bottom line: Proof generation may soon be automated. Specification — formally
stating what we actually want — remains a hard, human-shaped problem. The formal
methods community should build Lean infrastructure now and invest heavily in
specification technology, the problem with the longest shelf life and the greatest
unsolved need.

MISC TOPICS: Distilled (Software and Proof)