Florian Brand and creator @teortaxesTex debate whether LLM benchmarks should distinguish intentional safety refusals from API failures · Digg