Please read the RoboArena website for the updated rule & results! This should be fairer and more helpful for generalist robotics benchmarking
We observed evidence of benchmark hacking on RoboArena since April. We have taken steps to prevent this in the future, and we have rolled back evals in accordance with these steps to retain the integrity of the benchmark. Read more about our changes here: https://robo-arena.github.io.
