Maksym Andriushchenko's group releases a trajectory viewer and open-source inference traces for its agent optimization benchmark InferenceBench

Original post

Maksym Andriushchenko@maksym_andr#1207inTech

note: InferenceBench was released only a month ago, so we know that GLM-5.2 couldn't have been optimized on InferenceBench. yet, its results are very strong. this serves as additional evidence that it's indeed a good model, comparable to proprietary frontier models.

you can download and play around with the traces here: https://huggingface.co/datasets/aisa-group/InferenceBench-Trajectories

or you can view the traces directly here: https://inferencebench.ai/

Maksym Andriushchenko@maksym_andr

💥NEW: important updates on InferenceBench: - GLM-5.2 (Max) results are very strong (7.0x speed-up compared to Opus 4.8's 7.6x speed-up), - we have a trace viewer now available on our website, - all our traces are hosted on HuggingFace.

8:24 AM · Jun 22, 2026 · 181 Views

VIEWS1.3KBOOKMARKS3LIKES18REPLIES2

Maksym Andriushchenko@maksym_andr

Jehyeok Yeon @ ICML 2026 🇰🇷@jehyeoky248

🎉More updates for InferenceBench v1.0.2! Some highlights: - Added a Trajectory Viewer for InferenceBench runs - Added results for GLM-5.2 (Max), coming in at 6th place, only behind models like Opus 4.8/4.7 and Fable!

See the changes for yourself at: https://inferencebench.ai

2h1.3K183

RETWEETS1

Jehyeok Yeon @ ICML 2026 🇰🇷@jehyeoky248

Many people have been asking for agent traces for InferenceBench, so I've built a trajectory viewer for InferenceBench runs! You can now follow the agent during its run to see what the agent does during its run and how long it spends at each stage

3h22340

Jehyeok Yeon @ ICML 2026 🇰🇷@jehyeoky248

The goal of the trajectory viewer is to be able to get a high-level overview of agent trajectories without having to read through long complex text files. For deeper analysis, the traces themselves have also been uploaded to HuggingFace here: https://huggingface.co/datasets/aisa-group/InferenceBench-Trajectories

3h131

Shubhendu Trivedi@_onionesque

@maksym_andr 💥 NEW

Maksym Andriushchenko@maksym_andr

1h15010

Jehyeok Yeon @ ICML 2026 🇰🇷@jehyeoky248

Observation #1: GLM-5.2 (Max) does quite well, basically performing at or around the other top models like Fable 5 or Opus 4.7/4.8 for all scenarios except Scenario B (Decoding Speed). Very excited to see open-source models high up on the list!

3h671

Jehyeok Yeon @ ICML 2026 🇰🇷@jehyeoky248

3h15610