Danielle Fong says biological and cyber risk classifiers perform poorly because they do not use advanced Fable-class models
Rohit Krishnan flagged the performance gap in threat detection.
Some users praise the Bio Cyber AI Classifier for its realistic approach to detecting bad behavior and express willingness to pay more for an improved version, while others criticize the mental health safety classifier as terrible and过于广.
Most Activity
They are not using fable class models for the classifier. I can tell you that.
How should we update on the fact that while Fable is so good the classifier to detect bio/ cyber/ AI is so bad?

@DotDotJames Then they should say so and not beat around the bush (0.03% might be affected)

@krishnanrohit it has to be extremely fast and cheap ?
More or less a genuine question. If the answer is not much, that's fine, but Fable is amazing so I don't get why making a better classifier was not done. (didn't care enough is a good enough reason too prob btw)
@krishnanrohit this sounds exactly like overfitting to me, but this is just my opinion so its valid to update if you wish. be careful of drawing strong conclusions from limited data

@krishnanrohit They’re using Grok for that classifier to save money.

@krishnanrohit that they are realistic about how difficult it would be to precisely detect bad behavior. this is in fact the only way to get a low false negative rate
@krishnanrohit automated ML R&D in action
How should we update on the fact that while Fable is so good the classifier to detect bio/ cyber/ AI is so bad?

@krishnanrohit Given their capabilities, I assume that it is not accidental.

@fabianstelzer Yes

@krishnanrohit Companies seem to assume that classification is a low intelligence task but it's actually the opposite

@krishnanrohit false positives less bad than false negatives especially given their world view? & ya, wonder this is every time how can their classifier be worse than random

@krishnanrohit *in an adversarial environment

@krishnanrohit Chronicle of over-control foretold; the mental health safety classifier in 4.8 is insanely bad—automatically and immediately applied to almost any creative work. Weird that Anthropic spends so little effort on the classifiers.

@krishnanrohit I'm guessing theyre not exactly sure what they want to safeguard against, running a broad filter and then A/B testing and reviewing chats to see what filter prevents the things they want
Another bitter lesson

@krishnanrohit afraid of shoggoth having the goal to kill off the human race 👀

@krishnanrohit most coding tasks are closer to logic than uncertainty estimation.
calibration is hard humans barely do it, hence @wolf_vukovic wisdom of crowds etc

@krishnanrohit Same thing with the Claude desktop and iPhone apps

@smooth_normie Brain the size of a planet but can't do better than ban all bio people from saying hi?

@DanielleFong gpt-2's back

@fabianstelzer @krishnanrohit But Fable is already so expensive. I think a lot of people would be willing to pay a time and money premium for a better classifier if that's what it takes