/AI1d ago

Dimitris Papailiopoulos of Microsoft Research AI Frontiers teases how Codex used 'Idea Descent' to shrink a solver below 50 kB

The optimization run spanned 110 iterations using LZMA.

376972631676.9K
Original post
Dimitris Papailiopoulos@DimitrisPapail#193inAI

Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.

Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.

8:22 AM · Jun 7, 2026 · 52.3K Views
Sentiment

Many users like Codex's Idea Descent for its bold scientific iteration on novel solvers and code optimization, while others call out Claude models as worsening or accuse Anthropic of misleading claims.

Pos
62.5%
Neg
37.5%
19 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS17.3KBOOKMARKS35LIKES185RETWEETS9REPLIES17

Claude Code with 4.8 was given the same prompt, and the same goal, but kept insisting I was wrong and the goal is not achievable.

My number 1 reason for preferring Codex the past 2 weeks. So frustrating.

Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.

Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.

1dViews 17.3KLikes 185Bookmarks 35

@DimitrisPapail The weekly Dimitris-tease-post that will result in a banger blog down the line

Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.

Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.

1dViews 3.4KLikes 22Bookmarks 0

@DimitrisPapail user: "you did great job. next step - 25kB target. go ahead, you have 10 hours till I wake up"

1dViews 921Likes 1Bookmarks 1

@xeophon

@DimitrisPapail The weekly Dimitris-tease-post that will result in a banger blog down the line

1dViews 2.8KLikes 9Bookmarks 0
Yuntian Deng@yuntiandeng

@DimitrisPapail Similar experience here. I tried opus 4.8 on the symbolic gsm8k solver you proposed. Even after I told it others had reached 15% and pointed it to your original tweet, it kept insisting the goal was impossible and that 15% must have been an LLM predicting the operations.

Claude Code with 4.8 was given the same prompt, and the same goal, but kept insisting I was wrong and the goal is not achievable.

My number 1 reason for preferring Codex the past 2 weeks. So frustrating.

19hViews 544Likes 4Bookmarks 0
Wilkins Micawber@Me5466255992308

@DimitrisPapail which Claude effort level did you use?

1dViews 45
Patty@Patty_H93

@DimitrisPapail Seeing a lot of people upset with 4.8..is this a sign of model collapse? Or is it impossible to tell because anthropic could be throttling compute? But why would they do that right after a release of best available model ??

1dViews 37
Dan McAteer@daniel_mac8

@DimitrisPapail Interested.

1dViews 347Likes 1

@DimitrisPapail Trying to win the Hutter Prize :P

Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.

Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.

20hViews 796Likes 0Bookmarks 0
Noah@Noah64165746

@DimitrisPapail What happened with the small target? You wanted it to be as close as possible to 50kB, and not smaller?

1dViews 519
Matt Rickard@mattrickard

@DimitrisPapail i like the concept already

1dViews 518
Nicholas Bardy@NicholasBardy

@DimitrisPapail This is cool, would be happy to chat, I have done something similar sounding with 4D Gaussian splats.

1dViews 505
Alok Bishoyi@alokbishoyi97

@DimitrisPapail would love to lear more about your setup !

1dViews 494
Alpha Batcher@alphabatcher

@DimitrisPapail something new from Codex again?

we love it

23hViews 382
Garima@sincerelycheesy

@DimitrisPapail love when the model has the courtesy to show you exactly where you went wrong

22hViews 326
Daniel Auras@rasdani_

@DimitrisPapail i'm sorry Dave. i'm afraid i can't do that.

20hViews 80Likes 1
Maapu@Rajesh23MD

@DimitrisPapail What’s this graph about?

23hViews 172
Shuying Luo@shuying_luo

@DimitrisPapail I got similar experience! Or it claimed to solve something it never did.

23hViews 71
Aux@__12124__

@DimitrisPapail 2 weeks?? More like 6 months

1dViews 45
Load more posts