We have seen inference-time compute help in math and code but does it help in the domain of forecasting which is often linked more with knowledge?
In FutureSim, we found performance scales gracefully with test-time compute without any foreseeable ceiling! 🚀