Gemini 3.1 Pro and Gemini 3 Flash have most qualitative behaviors set by SFT, not RL, contrary to my expectations!
New GDM interp research: SFT is a big deal for safety relevant behaviors.
We recently investigated root causes for some of Gemini’s behaviors. We were surprised to find that many behaviors actually came from the initial supervised finetuning stage, not later stages like RL!
🧵



