/Tech1d ago

Maksym Andriushchenko of ELLIS Institute Tübingen clarifies that PostTrainBench hosts static evaluation traces while RewardBench provides active CI-compatible datasets

PostTrainBench runs seven benchmarks directly from their original pages.

1501584
Original post
Maksym Andriushchenko@maksym_andr#1168inTech

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

running PostTrainBench, however, doesn't require any new data to be downloaded, except the 7 benchmarks used for it, but those are downloaded using their respective HF pages. so our HF page (https://huggingface.co/datasets/aisa-group/PostTrainBench-Trajectories) is only hosting static traces from our evaluations. this makes the whole thing a bit more mysterious :-)

Nathan Lambert@natolambert

@maksym_andr Could be running the eval in a CI. Rewardbench hit like 500K in a month

1:17 PM · Jun 9, 2026 · 502 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS82
Nathan Lambert@natolambert

@maksym_andr ohhhh

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

running PostTrainBench, however, doesn't require any new data to be downloaded, except the 7 benchmarks used for it, but those are downloaded using their respective HF pages. so our HF page (https://huggingface.co/datasets/aisa-group/PostTrainBench-Trajectories) is only hosting static traces from our evaluations. this makes the whole thing a bit more mysterious :-)

1dViews 82Likes 0Bookmarks 0