Original post
Omar Khattab#160
Jasper Lu@lu__jasper
Getting back around to this. OBLIQ is a really interesting benchmark, and feels like the right one for this space.
It's almost gratuitously hard, but seems pretty well-aligned with interesting agent observability problems. Saturation on this set would probably solve a lot of more common real-world use cases along the way.
8:49 PM · Jun 1, 2026 · 3.5K Views