🏀🤖 There was lovely weather this weekend, so I let my agents run without babysitting. I combined the two things all my feeds are obsessed with right now: the New York Knicks and autonomous research agents. So, who will win the game today? 🧵
🏀🤖 There was lovely weather this weekend, so I let my agents run without babysitting. I combined the two things all my feeds are obsessed with right now: the New York Knicks and autonomous research agents. So, who will win the game today? 🧵

The Knicks are up 2–0 on the Spurs in the 2026 Finals. Game 3 is tonight at MSG. Instead of a hot take, I ran an experiment: can an autonomous AI research agent (in the style of @karpathy's "autoresearch") build and tune the prediction model by itself? 👀

The idea: apply Karpathy's autoresearch loop: let an LLM edit the training code, run a short experiment, keep the change only if a metric improved, and repeat. and point it at tonight's game.

@ziv_ravid שאלה של הדיוט (כרגיל): אז האוטוריסרצ׳ הזה זה פשוט אוטומציה של איטרציות על מודלים?
I feel that this tweet thread didn't get enough credit. TLDR: My agents build a predication model that predicts the Knicks will win in today's game
🏀🤖 There was lovely weather this weekend, so I let my agents run without babysitting. I combined the two things all my feeds are obsessed with right now: the New York Knicks and autonomous research agents. So, who will win the game today? 🧵

So, what's the model predicting for Game 3? 🏀 Knicks 108–105, Knicks 59% to win.
That sits right between Vegas (57%) and a separate Monte Carlo "blend" model (68%).
Player projections: • Brunson ~27 pts / 7 ast (±8) • Towns ~18 / 11 reb • Wembanyama ~25 / 11 reb (±9)

What's the baseline? We compare to Vegas - the sportsbook's closing line (the point spread and total set right before tip-off). It already prices in injuries, lineups, rest, and sharp-money opinion, which makes it the gold-standard public predictor of a game.

The results? Much better! 🚀 Result: holdout error 9.6 - almost matching Vegas. 🤝

My friend CC and I started with scrapetting 6 full seasons (2021 → 2026), including the playoffs (date, teams, scores, playoff flag). For each experiment, the agent proposes changes to the config, runs it, compares the error, keeps the changes if it improved, then repeats.

I used the first 4 seasons as training data, 2024-25 as the dev season the agent iterates on, and the final season (2025-26) as a locked holdout - scored exactly once, at the very end. No peeking.

So how good is the model, and how does the optimization actually look? Every dot is one candidate config (green = kept, hollow = rejected). The teal line tracks the best so far. You can literally watch the agent learn.

Starting from a plain gradient-boosting baseline, the agent found: * pre-game Elo ratings (with margin-of-victory updates) * back-to-back / fatigue flags * Boosting is all you need.
Result: holdout error 11.7 → 11.4, winner accuracy 66% → 69% (above Vegas's 67%).

Good, but not perfect. So we tried use more data. I won't go through every detail, but in general, we added injury / availability (who's actually playing tonight), player ratings + projected lineups, richer per-game box scores, and player tracking / shot data

It's important to note: at this stage the model does NOT beat Vegas on margin error. What's missing? The closing line aggregates thousands of sharp bettors pricing in injuries, lineups, rest, minutes limits, and late news - info known only hours before tip.

@_kobim סוג של. הוא פשוט בוחר איזה פרמטרים לאיזה יודע בדיוק להריץ באופן אוטומטי

@ziv_ravid הבייגל שלי יהודי ניקס ברביעי לופ רץ בקלוד הסרתי את עורך הקוד

It's off-course not betting advice - single NBA games are that random. That's the entire point. 🏀🤖
Go Knicks! 🧡💙

@ziv_ravid OK. I'm going to go with 119-99 Knicks. That's what you get if you exclude regular season and run the playoff results only.
🏀🤖 There was lovely weather this weekend, so I let my agents run without babysitting. I combined the two things all my feeds are obsessed with right now: the New York Knicks and autonomous research agents. So, who will win the game today? 🧵