VIEWS49BOOKMARKS1RETWEETS1

Aaron Scher@aaronscher
The best piece about the current state of alignment https://www.lesswrong.com/posts/WewsByywWNhX9rtwi/current-ais-seem-pretty-misaligned-to-me
4hViews 49Likes 1Bookmarks 1

The best piece about the current state of alignment https://www.lesswrong.com/posts/WewsByywWNhX9rtwi/current-ais-seem-pretty-misaligned-to-me

We don't have the ideal comparison data here: ideally there would be a Sonnet 4.5 level model that was very honest, didn't overclaim, never reward hacked, etc. But we have some evidence: current models aren't very well aligned, and people still use them & handoff important tasks.

@aaronscher Another piece of evidence is that market didn't converge on this: https://gwern.net/guardian-angel