alright folks we're talking to big eval master @xeophon in 3h tune in to ask sensible questions like:
- how to run your own evals without going insane? - are evals and environments kind of the same?? - why most benchmarks are janked??? - why LLM cheats?????
to kick off our big boss frontier research series we have the evals master florian joining us this friday from 12:00-14:00 to talk about LLM benchmarking
send your questions by comments, dm, fax, ping alexine, text or any other means and I'll weave your questions right in
