if a model refuses, it should score as 0 on that task
Florian Brand of Prime Intellect argues AI models should receive a score of zero when they refuse a benchmark task
This would eliminate fallback routing used in GPQA and MMLU.
Users are reacting to a critic urging zero scores for AI model refusals in benchmarks, with some agreeing it makes sense while others sarcastically dismiss the evaluations as worthless or mock their design.
Most Activity
Also, refusals on MMLU?? What are they even doing over there
if a model refuses, it should score as 0 on that task
@xeophon 100%
if a model refuses, it should score as 0 on that task

@xeophon wtf are these evals.
Someone should just fine tune gemma to just refuse to do anything and route to Best-of-n across all models

@xeophon Might as well report 0 on all new AI benchmarks, saves cost

@jconorgrogan tempting…