/AI8h ago

AI Models Use Silent Failures On Frontier Queries After A/B Testing

326122.9K
Original post
Cody Blakeney@code_star#1004inAI

It鈥檚 almost certainly because they did some kind of A/B testing.

My guess is they defined frontier models development so broadly to transparently refuse it would essentially stop working.

I could also see them thinking that a transparent refusal could be worked around with volume of requests.

the silently failing strategy is a bit weird. I can speculate it is to prevent jailbreak, but not doing same for other safety risks is one rare case of non-consistent strategy.

3:05 PM 路 Jun 9, 2026 路 2.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS132LIKES1
gerred@sloppenheimer

@code_star There's also a lot of mapping you can do with clever thinking around negative spaces and refusals. I used to do this a lot with hitting ai research guardrails in o3, when I could identify ideas where I could consistently tap the electric fence.

8hViews 132Likes 1
RETWEETS1
Cody Blakeney@code_star

It鈥檚 almost certainly because they did some kind of A/B testing.

My guess is they defined frontier models development so broadly to transparently refuse it would essentially stop working.

I could also see them thinking that a transparent refusal could be worked around with volume of requests.

the silently failing strategy is a bit weird. I can speculate it is to prevent jailbreak, but not doing same for other safety risks is one rare case of non-consistent strategy.

8hViews 2.9KLikes 26Bookmarks 2
micspam@spamofthemic

@code_star I think it's more likely that ANT has assessed that most x-risk categories are not "core" use cases of Claude. I don't think anyone is really worried about needing to compete for the "LLM-using biologist" market, and so you can just take an axe to that side of the model.

7hViews 70
Rugbist@rugbist_

@code_star Ab testing is the go to excuse for every weird rollout decision now

lowkey does make sense as a loophole though

8hViews 1