1d ago

DTU researchers challenge Stanford NLP's AXBENCH findings, claiming sparse autoencoders can outperform simple baselines for steering LLMs

The debate recalled past feedback that led to GATv2

DTU researchers challenge Stanford NLP's AXBENCH findings, claiming sparse autoencoders can outperform simple baselines for steering LLMs · Digg