/AI36m ago

Maksym Andriushchenko of ELLIS Institute Tübingen clarifies that PostTrainBench hosts static evaluation traces while RewardBench provides active CI-compatible datasets

PostTrainBench runs seven benchmarks directly from their original pages.

1401366
Original post
Maksym Andriushchenko@maksym_andr#1063inAI

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

running PostTrainBench, however, doesn't require any new data to be downloaded, except the 7 benchmarks used for it, but those are downloaded using their respective HF pages. so our HF page (https://huggingface.co/datasets/aisa-group/PostTrainBench-Trajectories) is only hosting static traces from our evaluations. this makes the whole thing a bit more mysterious :-)

Nathan Lambert@natolambert

@maksym_andr Could be running the eval in a CI. Rewardbench hit like 500K in a month

1:17 PM · Jun 9, 2026 · 247 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS29
Nathan Lambert@natolambert

@maksym_andr ohhhh

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

running PostTrainBench, however, doesn't require any new data to be downloaded, except the 7 benchmarks used for it, but those are downloaded using their respective HF pages. so our HF page (https://huggingface.co/datasets/aisa-group/PostTrainBench-Trajectories) is only hosting static traces from our evaluations. this makes the whole thing a bit more mysterious :-)

28mViews 29Likes 0Bookmarks 0
Maksym Andriushchenko of ELLIS Institute Tübingen clarifies that PostTrainBench hosts static evaluation traces while RewardBench provides active CI-compatible datasets · Digg