/Tech5h ago

Florian Brand and creator @teortaxesTex debate whether LLM benchmarks should distinguish intentional safety refusals from API failures

Both issues ultimately prevent accurate measurement of model capability

2900330

Original post

@teortaxesTex Intentional model refusals is different from flaky and slow APIs?

@xeophon didn't you say Fable should get 0 for fallbacks? or what that xeophon

10:37 PM · Jun 19, 2026 · 193 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS173LIKES4REPLIES1

@xeophon Different, but both are precluding the measure of objective model capability

@teortaxesTex Intentional model refusals is different from flaky and slow APIs?

5h17340