18h ago

Microsoft Research releases SkillOpt, an optimization method that treats AI agent skills as trainable external states of a frozen model

A textual learning rate governs validation-gated skill rewrites.

0
Original post

New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize. Probably not optimal. This works show why. It treats the skill doc as a trainable external state of a frozen agent instead. It introduces SkillOpt, where an optimizer model makes validation-gated edits to the skill file. It adds, deletes, or replaces instructions, with a textual learning rate that controls how aggressively each round rewrites the doc. The agent itself never changes. SkillOpt is best or tied on all 52 (model, benchmark, harness) cells. On GPT-5.5 it adds 23.5 points in direct chat, 24.8 with Codex, and 19.1 with Claude Code over no skill. It beats human-written skills, TextGrad, GEPA, and EvoSkill, carries zero extra inference-time cost, and the learned skills transfer across models and harnesses. Paper: https://arxiv.org/abs/2605.23904 Learn to build effective AI agents in our academy: https://academy.dair.ai/

8:40 AM · May 25, 2026 View on X
Reposted by

@omarsar0 Hell yeah this is awesome

Garry TanGarry Tan@garrytan

These concepts coming soon to GBrain this week

5:07 AM · May 26, 2026 · 17.6K Views
5:09 AM · May 26, 2026 · 1.7K Views

These concepts coming soon to GBrain this week

elviselvis@omarsar0

New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize. Probably not optimal. This works show why. It treats the skill doc as a trainable external state of a frozen agent instead. It introduces SkillOpt, where an optimizer model makes validation-gated edits to the skill file. It adds, deletes, or replaces instructions, with a textual learning rate that controls how aggressively each round rewrites the doc. The agent itself never changes. SkillOpt is best or tied on all 52 (model, benchmark, harness) cells. On GPT-5.5 it adds 23.5 points in direct chat, 24.8 with Codex, and 19.1 with Claude Code over no skill. It beats human-written skills, TextGrad, GEPA, and EvoSkill, carries zero extra inference-time cost, and the learned skills transfer across models and harnesses. Paper: https://arxiv.org/abs/2605.23904 Learn to build effective AI agents in our academy: https://academy.dair.ai/

3:40 PM · May 25, 2026 · 87.3K Views
5:07 AM · May 26, 2026 · 17.6K Views