/Tech2h ago

Cartwheel co-founder Andrew Carr says DeepSeek's hyperparameter configurations remain exceptionally difficult to beat

Grid searches struggle to improve upon DeepSeek's training baselines.

2302744

#484

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

it's really, really hard to improve much on DeepSeek No matter how much compute you have, odds are that your grid searches won't find a much better global optimum. By V1, they were locked in I remember @andrew_n_carr getting a lot of use out of their hparams after V2