Florian Brand, who works on LLM evaluations at Prime Intellect, teases unresolved upstream bugs in popular AI benchmarks without details · Digg