2h ago

FormalQualBench Comparator Verifies Lean Proofs for Correctness and No Extra Axioms

0
Original post

This led us to develop FormalQualBench (https://www.math.inc/formalqualbench), a benchmark designed to reinforce correctness standards across the field. With statements checked by a human expert, our goal is to guarantee that all proofs are faithful to the underlying mathematics.

8:51 PM · May 29, 2026 View on X

A key component of FormalQualBench is Comparator, which rigorously checks that each solution proves the correct statement, introduces no additional axioms, and is accepted by the lean kernel. Comparator detect sophisticated workarounds that evade basic compilation checks.

Alex GuAlex Gu@minimario1729

This led us to develop FormalQualBench (https://www.math.inc/formalqualbench), a benchmark designed to reinforce correctness standards across the field. With statements checked by a human expert, our goal is to guarantee that all proofs are faithful to the underlying mathematics.

3:51 AM · May 30, 2026 · 238 Views
3:51 AM · May 30, 2026 · 172 Views

In our evaluations, models like Codex employed elaborator-level tactics to bypass constraints. One example shows a Codex-generated snippet using "ax" ++ "iom" to inject an axiom via metaprogramming. This evades static detection but is reliably caught by Comparator.

Alex GuAlex Gu@minimario1729

A key component of FormalQualBench is Comparator, which rigorously checks that each solution proves the correct statement, introduces no additional axioms, and is accepted by the lean kernel. Comparator detect sophisticated workarounds that evade basic compilation checks.

3:51 AM · May 30, 2026 · 172 Views
3:51 AM · May 30, 2026 · 681 Views
FormalQualBench Comparator Verifies Lean Proofs for Correctness and No Extra Axioms · Digg