/AI36m ago

Maksym Andriushchenko of ELLIS Institute Tübingen clarifies that PostTrainBench hosts static evaluation traces while RewardBench provides active CI-compatible datasets

PostTrainBench runs seven benchmarks directly from their original pages.

1401366

#64

Original post

Maksym Andriushchenko@maksym_andr#1063inAI

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

running PostTrainBench, however, doesn't require any new data to be downloaded, except the 7 benchmarks used for it, but those are downloaded using their respective HF pages. so our HF page (https://huggingface.co/datasets/aisa-group/PostTrainBench-Trajectories) is only hosting static traces from our evaluations. this makes the whole thing a bit more mysterious :-)

Nathan Lambert@natolambert

@maksym_andr Could be running the eval in a CI. Rewardbench hit like 500K in a month

1:17 PM · Jun 9, 2026 · 247 Views

/AI36m ago

Maksym Andriushchenko of ELLIS Institute Tübingen clarifies that PostTrainBench hosts static evaluation traces while RewardBench provides active CI-compatible datasets

PostTrainBench runs seven benchmarks directly from their original pages.

1401366

#64

Original post

Maksym Andriushchenko@maksym_andr#1063inAI

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

Nathan Lambert@natolambert

@maksym_andr Could be running the eval in a CI. Rewardbench hit like 500K in a month

1:17 PM · Jun 9, 2026 · 247 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

Nathan Lambert@natolambert

@maksym_andr ohhhh

Maksym Andriushchenko@maksym_andr

i think there is a difference: the RewardBench HF page hosts an eval set which makes sense to integrate in a CI.

28m2900