LLM evaluation platform Arena launches Agent Mode to test frontier models like Mistral 3.5 on multi-step, tool-enabled tasks · Digg