And when it breaks (it always breaks), you analyze the failure cases, characterize them, and ask: why here, why this input, why this regime? The failure cases slowly turn into your problem statement, and the idea is something the data forced on you
The second way is less romantic. You don't start with a specific idea, but you start with the best thing that already exists. You take the strongest baseline you can find, you reproduce it honestly, and then you push it until it breaks.