6h agoResearcher Launches Series Testing RLMs on AppWorld Benchmark——0——Original postAZ#853@A1ZHANGOPGLGabriel Lespérance|@GABLESPERANCEFirst in a two-part series where I throw RLMs at benchmarks and see how far they can go. We start with AppWorld 🌎 Next: TerminalBench 2.19:09 AM · May 30, 2026 View on X