2h ago

Yoav Goldberg notes that inventive elements of mathematical reasoning such as creating new concepts remain difficult to verify mechanistically in AI systems

Luca Ambrogioni replies that AI relies solely on natural text generation.

1325141.7K

——0——

Original post

by weird and annoying complete coincidence, these also happens to be parts which are very hard to verify mechanistically

9:11 AM · May 22, 2026

#92(((ل()(ل() 'yoav))))👾@YOAVGO

@LucaAmb in training it is

Luca Ambrogioni@LucaAmb

@yoavgo Keep in mind that right now things are not verifyed mechanicistically anyway. It's natural text

4:28 PM · May 22, 2026 · 81 Views

4:40 PM · May 22, 2026 · 45 Views

#92(((ل()(ل() 'yoav))))👾@YOAVGO

@LucaAmb are you sure? i'd be surprised if it was only demonstration based sft

Luca Ambrogioni@LucaAmb

@yoavgo Training uses human annotated data, it's not formal verification

4:44 PM · May 22, 2026 · 26 Views

4:48 PM · May 22, 2026 · 21 Views

#92(((ل()(ل() 'yoav))))👾@YOAVGO

@LucaAmb so they are trained only on sft from positive examples? how would that work?

Luca Ambrogioni@LucaAmb

@yoavgo The proofs are not formalized in Lean, so I do not see how they could have used formal validation Of course they can use LLM validation as proxy, but that's usable in any setting

5:01 PM · May 22, 2026 · 17 Views

5:16 PM · May 22, 2026 · 19 Views

#92(((ل()(ل() 'yoav))))👾@YOAVGO

@LucaAmb without any negative example? that is super surprising if it works

Luca Ambrogioni@LucaAmb

@yoavgo As far as I know, they are relying on highly curated reasoning chains from mathematicians The miracle of fine tuning

5:19 PM · May 22, 2026 · 21 Views

5:28 PM · May 22, 2026 · 15 Views

#92(((ل()(ل() 'yoav))))👾@YOAVGO

@LucaAmb the result was obtained using natural language. but i suspect the training pipeline had some non-nl components, even if they were hidden from the model. but cannot know for sure of course

Luca Ambrogioni@LucaAmb

I don't know if they use negative examples. I am just sure that the can't use formal validation since the output wasn't in Lean, it was a human verified natural language paper Formalization is a long term goal for them, but it didn't drive progress so far. The driver of progress was high quality human feedback

5:40 PM · May 22, 2026 · 13 Views

5:43 PM · May 22, 2026 · 18 Views

#92(((ل()(ل() 'yoav))))👾@YOAVGO

@LucaAmb i did not say formal, i said verification. at a minimum like other RLVR approaches. let it prove known-to-be-true things as well as known-to-be-false things. do it also for intermediary steps, with a curriculum

Luca Ambrogioni@LucaAmb

@yoavgo Then what form of formal verification do you have in mind?

6:00 PM · May 22, 2026 · 12 Views

6:23 PM · May 22, 2026 · 11 Views

#1822Luca Ambrogioni@LUCAAMB

@yoavgo Keep in mind that right now things are not verifyed mechanicistically anyway. It's natural text

(((ل()(ل() 'yoav))))👾@yoavgo

by weird and annoying complete coincidence, these also happens to be parts which are very hard to verify mechanistically

4:11 PM · May 22, 2026 · 1.6K Views

4:28 PM · May 22, 2026 · 81 Views

#1822Luca Ambrogioni@LUCAAMB

@yoavgo Training uses human annotated data, it's not formal verification

(((ل()(ل() 'yoav))))👾@yoavgo

@LucaAmb in training it is

4:40 PM · May 22, 2026 · 45 Views

4:44 PM · May 22, 2026 · 26 Views

#1822Luca Ambrogioni@LUCAAMB

@yoavgo The proofs are not formalized in Lean, so I do not see how they could have used formal validation

Of course they can use LLM validation as proxy, but that's usable in any setting

(((ل()(ل() 'yoav))))👾@yoavgo

@LucaAmb are you sure? i'd be surprised if it was only demonstration based sft

4:48 PM · May 22, 2026 · 21 Views

5:01 PM · May 22, 2026 · 17 Views

#1822Luca Ambrogioni@LUCAAMB

@yoavgo As far as I know, they are relying on highly curated reasoning chains from mathematicians

The miracle of fine tuning

(((ل()(ل() 'yoav))))👾@yoavgo

@LucaAmb so they are trained only on sft from positive examples? how would that work?

5:16 PM · May 22, 2026 · 19 Views

5:19 PM · May 22, 2026 · 21 Views

#1822Luca Ambrogioni@LUCAAMB

I don't know if they use negative examples. I am just sure that the can't use formal validation since the output wasn't in Lean, it was a human verified natural language paper

Formalization is a long term goal for them, but it didn't drive progress so far. The driver of progress was high quality human feedback

(((ل()(ل() 'yoav))))👾@yoavgo

@LucaAmb without any negative example? that is super surprising if it works

5:28 PM · May 22, 2026 · 15 Views

5:40 PM · May 22, 2026 · 13 Views

#1822Luca Ambrogioni@LUCAAMB

I highly doubt it. The structure of a formal Lean proof is very different and it will likely not provide much help to the non-formal task

In general, I am not aware on any major math results first proven in a computer verifiable form. They are always proven in natural language first and then formalized after the facts

(((ل()(ل() 'yoav))))👾@yoavgo

@LucaAmb the result was obtained using natural language. but i suspect the training pipeline had some non-nl components, even if they were hidden from the model. but cannot know for sure of course

5:43 PM · May 22, 2026 · 18 Views

5:47 PM · May 22, 2026 · 22 Views

#1822Luca Ambrogioni@LUCAAMB

@yoavgo Then what form of formal verification do you have in mind?

(((ل()(ل() 'yoav))))👾@yoavgo

@LucaAmb i did not say they used lean..

5:53 PM · May 22, 2026 · 20 Views

6:00 PM · May 22, 2026 · 12 Views

#1822Luca Ambrogioni@LUCAAMB

@yoavgo Oh ok, then I misunderstood what you meant. Can you explain to me what you mean by verification in this context?

(((ل()(ل() 'yoav))))👾@yoavgo

6:23 PM · May 22, 2026 · 11 Views

6:28 PM · May 22, 2026 · 4 Views