my favorite thing is when i drop a post that gets more bookmarks than likes
synth envs can give you a broad lens on the types of things a model is good or bad at, and clever filters + diagnostics can highlight gaps you wouldn't normally find i.e sorting tasks by what GLM saturates but GPTmini flails at lets you ask, whats the correlated meta problem?
