AI writer @deepfates asks if anyone is systematically tracking language model character traits and emergent behaviors
Founder @DanielleFong says tracking is currently done ad hoc.
Most Activity
I think this is really important work. Will be reporting our first results in this vein in the next few weeks. So far we've mainly been setting up the experimental testbeds and getting initial results, but we'll have the mechanism in place to get deeper into the models' character traits than I think behavioural evals conducted to date have done.
who is tracking the character traits of language models? how well they follow their spec/constitution, emergent behaviors, etc.. is anyone doing this