/Tech12h ago

Sutter Hill Ventures partner Alex Peysakhovich says Codex spent $400 on Modal with zero results when run autonomously

It succeeded when used directly as a coding assistant.

2122145228.1K

#232

Original post

Stanislav Fort@stanislavfort#232inTech

This matches my recent experience with AI agents as research assistants. Amazing at coding, sub-good-masters-student at navigating the space of ideas and updating based on experimental results.

alex peysakhovich@alex_peys

i gave codex a /goal to improve an ml training pipeline i am working on while i went for a hike.

during the hike i had an idea. which i came back and (codex) implemented and it worked to bump things up a bit.

in the meantime /goal spent $400 on modal and a lot of tokens to achieve nothing. i went through the ideas it had come up with and they were decent generic ml ideas (eg try this normalization) but terrible for the thing i was working on.

so… coding assistant? very good. even jr researcher? not yet.

3:35 PM · Jun 8, 2026 · 1.7K Views

/Tech12h ago

Sutter Hill Ventures partner Alex Peysakhovich says Codex spent $400 on Modal with zero results when run autonomously

It succeeded when used directly as a coding assistant.

2122145228.1K

#232

Original post

Stanislav Fort@stanislavfort#232inTech

This matches my recent experience with AI agents as research assistants. Amazing at coding, sub-good-masters-student at navigating the space of ideas and updating based on experimental results.

alex peysakhovich@alex_peys

i gave codex a /goal to improve an ml training pipeline i am working on while i went for a hike.

during the hike i had an idea. which i came back and (codex) implemented and it worked to bump things up a bit.

so… coding assistant? very good. even jr researcher? not yet.

3:35 PM · Jun 8, 2026 · 1.7K Views

Sentiment

Positive users highlight successful custom loops with Gemini for research goals like lowering loss, while negative users dismiss Codex as falling short on basic training runs and evals or blame user skill.

Pos

33.3%

Neg

66.7%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS366LIKES8

Ravid Shwartz Ziv@ziv_ravid

@alex_peys But did you tell it not to make any mistakes?

alex peysakhovich@alex_peys

i gave codex a /goal to improve an ml training pipeline i am working on while i went for a hike.

during the hike i had an idea. which i came back and (codex) implemented and it worked to bump things up a bit.

so… coding assistant? very good. even jr researcher? not yet.

6h36680

REPLIES1

JJ Schultz@jjschultz

@alex_peys hmm - over a year ago I did this using a manually scripted loop + gemini (for the big context window). I defined a goal (ie decrease loss) and fed in the logs of prev runs in the context. it tweaked the hyperparams and architecture.

and it worked AMAZING!

8h130

alex peysakhovich@alex_peys

@cs_serdar they are fine for some things, if i was trying to squeeze the last 5% out of my architecture by throwing known tricks at the wall and seeing what sticks this is totally doable by a coding agent.

for figuring out that one of my training datasets has a subtle issue, not so much

8h2761

serdarml@cs_serdar

@alex_peys I fell into the autoresearch hype today and tried to get codex to do some basic stuff like run a default training run and eval it on some pipeline. The models are simply not good enough for agentic research, they do plenty of stupid things. This is with 5.5.

8h2681

serdarml@cs_serdar

@alex_peys It's definitely not useless, especially if the environment is set up perfectly. But it's prone to making slight mistakes it won't notice and blame other things/the experiment itself. The more "agentic" the workflow, the more likely it is for the result to be garbage.

8h361

alex peysakhovich@alex_peys

@jjschultz

8h85

Pulkit@puhlkit

In my mind, /goal exists to make sure the agent actually completes the task. There are many situations where models regularly fail:

1. Tell it to do 10 things. It will do some, say “I’ll do rest next”. Use /goal to get it to do all 10. 2. Tell it to do something repeatedly. It will stop without finishing. /goal makes sure it finishes. e.g.

7h23

sakanade@0xsakanade

@alex_peys Maybe it’s a you thing

7h11