/Tech1d ago

AVERI's Miles Brundage criticizes OpenAI for silent A/B testing on frontier models, warning it harms reproducibility of safety research

The critique followed reports of query nerfing on Fable

1369015.1K
Original post
Miles Brundage@Miles_Brundage#22inTech

Prompted by the Fable "nerfing on frontier AI development related queries" stuff but the point is more general...

I have criticized OAI many times for silent A/B testing, which I think is inappropriate for such a critical technology

Miles Brundage@Miles_Brundage

I tentatively think that silent model switching is never a good idea.

It's horrible for research (including safety research), among many other effects

12:09 PM · Jun 9, 2026 · 1.8K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.8KLIKES21
Miles Brundage@Miles_Brundage

That doesn't mean Ant + others should just sit there and tolerate abuse.

There is a large action space, including throttling + issuing warnings, investigating the abuse, etc.

Miles Brundage@Miles_Brundage

Prompted by the Fable "nerfing on frontier AI development related queries" stuff but the point is more general...

I have criticized OAI many times for silent A/B testing, which I think is inappropriate for such a critical technology

1dViews 1.8KLikes 21Bookmarks 0
REPLIES1
Miles Brundage@Miles_Brundage

@sandersted That’s interesting pushback + I am certainly quite system evaluation pilled so am receptive… I’m not sure it’s unreasonable for at least some people to be fixated in that way for research purposes. And Ant has shown no interest in a large scale researcher vetting thing.

Ted Sanders@sandersted

@Miles_Brundage disagree on this one, actually

I will continue to beat the drum of evaluating systems not models. And if you choose to sell a Fable/Opus hybrid system to the public, then that’s what should be evaluated.

It’s only annoying if you’re fixated on evaluating the Fable model.

17hViews 199Likes 2Bookmarks 0
Ted Sanders@sandersted

@Miles_Brundage disagree on this one, actually

I will continue to beat the drum of evaluating systems not models. And if you choose to sell a Fable/Opus hybrid system to the public, then that’s what should be evaluated.

It’s only annoying if you’re fixated on evaluating the Fable model.

Miles Brundage@Miles_Brundage

I tentatively think that silent model switching is never a good idea.

It's horrible for research (including safety research), among many other effects

17hViews 558Likes 9Bookmarks 0
Miles Brundage@Miles_Brundage

It also means you don't get a feedback signal on false positives - people can't complain if they don't know it's happening.

1dViews 547Likes 8
Miles Brundage@Miles_Brundage

@BlackHC @yong_zhengxin One might choose to call it something other than model-switching (PET, steering vectors... sounds like effectively model switching to me, but anyway)... point is, it is a silent degradation

@Miles_Brundage @yong_zhengxin It is not switching though. Still using Fable but sandbagging via prompt injection?

22hViews 94Likes 2Bookmarks 0
Ted Sanders@sandersted

@Miles_Brundage Yeah, evaluating the system tells about the public thing. Evaluating the model tells us about potential future things or internal things. But that can be extended - e.g., maybe Fable + steering vector is stronger than Fable alone. Should they allow people to test that too?

Miles Brundage@Miles_Brundage

@sandersted I guess I have a very strong “no secrets” prior when it comes to research that this is in tension with though not sure where best to draw the line (eg I am ok with secret weights. Secret model identity seems worse, but dunno if that is valid)

14hViews 100Likes 1Bookmarks 0
Miles Brundage@Miles_Brundage

@sandersted So it seems to create huge error bars around model capability, what’s a capability issue vs a silent degradation issue, etc. for relatively little safety gain - or at least, they did not really articulate the safety gain/tradeoff well. They have info we don’t…

Miles Brundage@Miles_Brundage

@sandersted That’s interesting pushback + I am certainly quite system evaluation pilled so am receptive… I’m not sure it’s unreasonable for at least some people to be fixated in that way for research purposes. And Ant has shown no interest in a large scale researcher vetting thing.

17hViews 156Likes 0Bookmarks 0
Miles Brundage@Miles_Brundage

@sandersted I guess I have a very strong “no secrets” prior when it comes to research that this is in tension with though not sure where best to draw the line (eg I am ok with secret weights. Secret model identity seems worse, but dunno if that is valid)

Miles Brundage@Miles_Brundage

@sandersted So it seems to create huge error bars around model capability, what’s a capability issue vs a silent degradation issue, etc. for relatively little safety gain - or at least, they did not really articulate the safety gain/tradeoff well. They have info we don’t…

17hViews 133Likes 0Bookmarks 0
Miles Brundage@Miles_Brundage

@BlackHC @yong_zhengxin Not sure I follow what point you're trying to make. Sounded like you were defending the [model/system/whatever] switching thing, but now I am not sure

@Miles_Brundage @yong_zhengxin I guess that's why Fable and Mythos are separate offerings because one can simply view Fable as the whole system (incl steering vectors etc)? Obv this won't allow valid inferences for Mythos

22hViews 57Likes 0Bookmarks 0

@Miles_Brundage @yong_zhengxin I guess that's why Fable and Mythos are separate offerings because one can simply view Fable as the whole system (incl steering vectors etc)? Obv this won't allow valid inferences for Mythos

Miles Brundage@Miles_Brundage

@BlackHC @yong_zhengxin One might choose to call it something other than model-switching (PET, steering vectors... sounds like effectively model switching to me, but anyway)... point is, it is a silent degradation

22hViews 56Likes 0Bookmarks 0
Miles Brundage@Miles_Brundage

@sandersted (I think they should let a small number, yes, though as noted they have no apparent interest in doing the large scale vetting needed to make it more than that - only just starting a bio trusted access thing now. So I’m focusing on the large scale case)

Ted Sanders@sandersted

@Miles_Brundage Yeah, evaluating the system tells about the public thing. Evaluating the model tells us about potential future things or internal things. But that can be extended - e.g., maybe Fable + steering vector is stronger than Fable alone. Should they allow people to test that too?

14hViews 77Likes 1Bookmarks 0
AVERI's Miles Brundage criticizes OpenAI for silent A/B testing on frontier models, warning it harms reproducibility of safety research · Digg