2h ago

FormalQualBench Comparator Verifies Lean Proofs for Correctness and No Extra Axioms

39411.1K

——0——

Original post

This led us to develop FormalQualBench (https://www.math.inc/formalqualbench), a benchmark designed to reinforce correctness standards across the field. With statements checked by a human expert, our goal is to guarantee that all proofs are faithful to the underlying mathematics.

8:51 PM · May 29, 2026

#1109Alex Gu@MINIMARIO1729

A key component of FormalQualBench is Comparator, which rigorously checks that each solution proves the correct statement, introduces no additional axioms, and is accepted by the lean kernel. Comparator detect sophisticated workarounds that evade basic compilation checks.

Alex Gu@minimario1729

3:51 AM · May 30, 2026 · 238 Views

3:51 AM · May 30, 2026 · 172 Views

#1109Alex Gu@MINIMARIO1729

In our evaluations, models like Codex employed elaborator-level tactics to bypass constraints. One example shows a Codex-generated snippet using "ax" ++ "iom" to inject an axiom via metaprogramming. This evades static detection but is reliably caught by Comparator.

Alex Gu@minimario1729

3:51 AM · May 30, 2026 · 172 Views

3:51 AM · May 30, 2026 · 681 Views