@eliebakouch hard to really know whats going on as nothing seems fully public here?
like this seems like sys prompt <> user prompt // instruction following
also: "It seems very likely that awareness of being evaluated for honesty influences its behavior on this evaluation." (lol)
@xeophon There is appollo research blogs as well