Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.
Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.
The optimization run spanned 110 iterations using LZMA.
Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.
Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.
Many users like Codex's Idea Descent for its bold scientific iteration on novel solvers and code optimization, while others call out Claude models as worsening or accuse Anthropic of misleading claims.
Claude Code with 4.8 was given the same prompt, and the same goal, but kept insisting I was wrong and the goal is not achievable.
My number 1 reason for preferring Codex the past 2 weeks. So frustrating.
Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.
Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.
@DimitrisPapail The weekly Dimitris-tease-post that will result in a banger blog down the line
Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.
Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.

@DimitrisPapail user: "you did great job. next step - 25kB target. go ahead, you have 10 hours till I wake up"
@xeophon
@DimitrisPapail The weekly Dimitris-tease-post that will result in a banger blog down the line
@DimitrisPapail Similar experience here. I tried opus 4.8 on the symbolic gsm8k solver you proposed. Even after I told it others had reached 15% and pointed it to your original tweet, it kept insisting the goal was impossible and that 15% must have been an LLM predicting the operations.
Claude Code with 4.8 was given the same prompt, and the same goal, but kept insisting I was wrong and the goal is not achievable.
My number 1 reason for preferring Codex the past 2 weeks. So frustrating.

@vorushin i literally asked it to go to 21kb lol

@DimitrisPapail which Claude effort level did you use?

@DimitrisPapail Seeing a lot of people upset with 4.8..is this a sign of model collapse? Or is it impossible to tell because anthropic could be throttling compute? But why would they do that right after a release of best available model ??

@DimitrisPapail Interested.
@DimitrisPapail Trying to win the Hutter Prize :P
Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.
Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.

@DimitrisPapail What happened with the small target? You wanted it to be as close as possible to 50kB, and not smaller?

@DimitrisPapail i like the concept already

@DimitrisPapail This is cool, would be happy to chat, I have done something similar sounding with 4D Gaussian splats.

@DimitrisPapail would love to lear more about your setup !

@DimitrisPapail something new from Codex again?
we love it

@DimitrisPapail love when the model has the courtesy to show you exactly where you went wrong

@DimitrisPapail i'm sorry Dave. i'm afraid i can't do that.

@DimitrisPapail What’s this graph about?

@DimitrisPapail I got similar experience! Or it claimed to solve something it never did.

@DimitrisPapail 2 weeks?? More like 6 months
The optimization run spanned 110 iterations using LZMA.
Codex may have legitimately done something novel using "Idea Descent" today. Gonna write about this one soon.
Funny aside: the spike is me suggesting a stupid idea that it tried, and clearly was bad.