i’m really surprised that people don’t see this.
It’s mathematically true that llms can’t come up with novel ideas, because the whole point of training is to reduce loss, gain rewards so that the model adhere to rules and ground truth.
if you have a model that can come up with novel ideas, it must have high loss during sft or rl.





