6h ago

Researcher Launches Series Testing RLMs on AppWorld Benchmark

——0——
Original post
alex zhangAZ#853@A1ZHANGOPGabriel LespéranceGLGabriel Lespérance|@GABLESPERANCE

First in a two-part series where I throw RLMs at benchmarks and see how far they can go. We start with AppWorld 🌎 Next: TerminalBench 2.1

9:09 AM · May 30, 2026 View on X
0618858.9K

Cluster engagement

21 snapshots