22h agoMETR's Daniel Filan argues 'computer use evals' should be renamed to 'GUI use evals' to improve benchmark precision— David Manheim says current benchmarks conflate GUI and computer use.——0——