VIEWS320LIKES1REPLIES1

Yonatan Belinkov@boknilev
@lanqui on refusal formation areas in different layers and the gap between recognizing harm and refusing
21hViews 320Likes 1
Users praised the poster session for its effort making Interpretability techniques accessible to experts from other domains.

@lanqui on refusal formation areas in different layers and the gap between recognizing harm and refusing

@lanqui In the poster session, nice effort for making Interpretability techniques accessible to experts from other domains