/Tech31d ago

Microsoft's SkillOpt improves AI agent coding performance by 23.5 points by optimizing Markdown skill documents instead of weights

An optimizer edits the skill documents based on failures

19165291199.3K

Original post

The problem is that agent skills are usually hand-written, made once by an LLM, or revised in loose ways that can easily make them worse.

SkillOpt from Microsoft, argues that agent skills should be trained like small external programs, it teaches AI agents better task habits by editing a reusable skill document, not the model itself.

The paper’s core idea is to treat the skill document like the thing being trained, while the main AI model stays frozen and unchanged.

SkillOpt watches the agent try tasks, studies what worked and failed, then asks a stronger optimizer model to suggest small edits to the skill.

It only accepts an edit when the new skill improves on a held-out check set, so the skill does not drift just because an edit sounds good.

The authors tested this across 6 benchmarks, 7 target models, and 3 agent settings, including direct chat, Codex, and Claude Code.

SkillOpt was best or tied on all 52 tested cases, and on GPT-5.5 it raised average accuracy by 23.5 points in direct chat.

The final result is a small readable skill file that can improve agents across tasks and settings without retraining the model.

The best part is that the optimizer is used during training, but deployment only needs the final skill file.

That makes the artifact inspectable, portable, and cheap to reuse, which is exactly what most prompt-engineering systems lack.

----

Link – arxiv. org/abs/2605.23904

Title: "SkillOpt: Executive Strategy for Self-Evolving Agent Skills"

1:52 AM · May 29, 2026 · 9.9K Views

Sentiment

Positive users praise SkillOpt for boosting smaller AI models competitively and upgrading agent skill optimization via text edits, while negative users mock the approach or fear long-term degradation.

Pos

66.7%

Neg

33.3%

6 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS102LIKES5RETWEETS3

Shinka - AI@ShinkaIoT

This highlights the ultimate macro paradox of the agentic shift. 🧠

If a company can replace linear human headcount scaling with exponentially scaling GPU clusters, the short-term ROI is undeniable. But it brings back the classic Ford paradox: AI agents don't order Ubers or buy software.

The power-users winning right now are the ones who treat AI as an expansion pack for human judgment, not just a macro for writing more diffs. High-value discernment is the only true defense left. 🔥

31d1025

BOOKMARKS1

Fedir "Ted" Martynov 🇺🇦@byte_ua

@rohanpaul_ai Held-out check set is the whole trick here. Otherwise the optimizer just keeps making the skill doc sound smarter while quietly making the agent worse.

31d3111

REPLIES1

Mithun Kumar@Mithunkumardev

@rohanpaul_ai hand-written, made once by an LLM"... yeah that's also how i designed my gym routine. explains why it degraded over time.

31d971

Inflectiv AI ⧉@inflectivAI

@rohanpaul_ai This approach could make smaller open-source or frozen models much more competitive by improving execution quality without increasing inference costs. Better skills may matter more than larger parameter counts in many workflows.

31d4721