If you're building or evaluating data agents, you should check out the Data Agent Benchmark (DAB) by @ruiyingm1120 , @sh_reya et al. DAB evaluates agents on more realistic enterprise data problems involving multiple databases, messy joins, unstructured text, and domain-specific knowledge. This imo is vital than evaluating agents on SQL generation alone.
We’re making Codex more useful for your work by expanding plugins beyond individual tools.
These plugins turn Codex into a specialist for a specific role with a single install, no coding required.
Codex can access 62 popular apps and 110 skills for work across sales, data analytics, creative production, product design, and public equity investing.
https://openai.com/index/codex-for-every-role-tool-workflow/