it's pretty cool that the steering evaluation approach that @ZhengxuanZenWu designed in axbench is now more or less standard in papers that eval steering
10:25 PM · Jun 11, 2026 · 1.5K Views
it's pretty cool that the steering evaluation approach that @ZhengxuanZenWu designed in axbench is now more or less standard in papers that eval steering
@ZhengxuanZenWu this is Guide Labs' new technical report and Goodfire's "Anatomy of Post-Training" respectively
it's pretty cool that the steering evaluation approach that @ZhengxuanZenWu designed in axbench is now more or less standard in papers that eval steering