15h ago

Arkil Patel posts paper “Forecasting Downstream Performance of LLMs With Proxy Metrics” showing cross-entropy loss correlates below 0.5 with downstream results while proposed proxy metrics exceed 0.8

Proxy metrics rank pretraining datasets with fraction of usual compute.

0
Original post

Excited to share our new paper! “Forecasting Downstream Performance of LLMs With Proxy Metrics” w/ my amazing advisors @sivareddyg, @mariusmosbach, @DBahdanau Cross-entropy loss is a poor predictor of how models perform on downstream tasks (esp. reasoning). We propose something better: proxy metrics computed over expert reasoning traces. 🧵 Thread below 👇

6:44 AM · May 22, 2026 View on X

Nature is complex. Why would cross-entropy loss predict scaling behavior of language models on downstream task? Introducing data-driven proxy metrics for scaling laws. Proxy metrics are incredibly useful especially on tasks where models don't perform strongly yet.

Excellent work by @arkil_patel!

Arkil PatelArkil Patel@arkil_patel

Excited to share our new paper! “Forecasting Downstream Performance of LLMs With Proxy Metrics” w/ my amazing advisors @sivareddyg, @mariusmosbach, @DBahdanau Cross-entropy loss is a poor predictor of how models perform on downstream tasks (esp. reasoning). We propose something better: proxy metrics computed over expert reasoning traces. 🧵 Thread below 👇

1:44 PM · May 22, 2026 · 81.2K Views
9:43 PM · May 22, 2026 · 1.8K Views

To democratize AI, we need to help AI practitioners argue how investment can bring returns in the forms of superior intelligence. Forecasting downstream performance is super important! Check out @arkil_patel's work:

Arkil PatelArkil Patel@arkil_patel

Excited to share our new paper! “Forecasting Downstream Performance of LLMs With Proxy Metrics” w/ my amazing advisors @sivareddyg, @mariusmosbach, @DBahdanau Cross-entropy loss is a poor predictor of how models perform on downstream tasks (esp. reasoning). We propose something better: proxy metrics computed over expert reasoning traces. 🧵 Thread below 👇

1:44 PM · May 22, 2026 · 81.2K Views
2:14 PM · May 22, 2026 · 3.8K Views