/Tech4h ago

Together AI's Ashwinee Panda proposes a deletion-focused benchmark to stop AI coding agents from generating bloated code

The benchmark aims to curb additive bias in auto-RL workflows.

7333102.9K

Original post

I disagree with the overall sentiment of this HN user but I have seen agents write terrible spaghetti code- that's exactly what CodeClash, led by @jyangballin & @KLieret, evaluates. An agent needs to maintain a codebase while facing an adversarial opponent multiple times. We see the failure cases mentioned here in the CodeClash trajectories frequently.

10:10 AM · Jul 3, 2026 · 2.6K Views

Sentiment

Positive users praise the CodeClash benchmark for AI coding agents and call AI coding amazing with room for further gains, while the negative reply notes most agents still cannot fix bugs under adversarial pressure.

Pos

75.0%

Neg

25.0%

6 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Ofir Press@OfirPress

@jyangballin @KLieret AI coding is getting so much better all the time but there are still some facets of it that we don't have great benchmarks for. Once we do, we'll improve on those aspects as well.

5h4751

LIKES2

Ashwinee Panda @ICML2026@PandaAshwinee

@OfirPress @jyangballin @KLieret this is just true, though -agents _do_ prefer to add and never delete and never reuse. wdym you disagree?

5h1632

REPLIES1

Ashwinee Panda @ICML2026@PandaAshwinee

@OfirPress @jyangballin @KLieret i think every individual sentence in the post is true but my overall sentiment is the opposite (AI coding is amazing)

5h54

Ofir Press@OfirPress

@PandaAshwinee @jyangballin @KLieret I disagree with the notion (mentioned in the title) that "AI coding is a nightmare".

5h99

V0LYX@0xV0LYX

@OfirPress @jyangballin @KLieret having an adversary actively breaking ur code changes the game entirely. most agents cant even fix their own bugs let alone fight back

5h40

Ofir Press@OfirPress

@jyangballin @KLieret https://codeclash.ai/

5h4591

Ofir Press@OfirPress

@0xV0LYX @jyangballin @KLieret In CodeClash the adversary doesn't touch your code, they battle you in a code-based arena like RobotRumble. So it's tough but not as tough as having someone manipulate your code.

5h441

Ofir Press@OfirPress

@PandaAshwinee @jyangballin @KLieret Ya AI coding is amazing, and there's still a lot we can further improve

4h47

Ferbin@Ferbin08

@OfirPress @jyangballin @KLieret where do you hit the wall? holding bigger code, or when changes ripple across multiple files?

2h71

Henry Dowling@henrytdowling

@OfirPress @jyangballin @KLieret this is cool, is the idea basically to force the codebase to "make contact with reality" more?

5h21

Ferbin@Ferbin08

@OfirPress @jyangballin @KLieret Agents write spaghetti code the same way anyone does: shortcuts work until they're tested. CodeClash runs the test early.

4h15