/Tech3h ago

GPT-5.5 Autonomously Generates List of Smoke Tests via /goal

11115077.8K

#1215

Original post

Lisan al Gaib@scaling01#1215inTech

when you leave GPT-5.5 doing it's thing with /goal

"just one more smoke test PLEAAASE"

7:50 AM · Jun 26, 2026 · 597 Views

Sentiment

Positive users approve the multi-thread setup for GPT-5.5 smoke test generation as more sane and grounded, while negative users dismiss the agent's output as unproductive busywork, test sprawl, and nervous looping.

Pos

33.3%

Neg

66.7%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS7.2KBOOKMARKS7LIKES114REPLIES10

Lisan al Gaib@scaling01

when you let GPT-5.5 do its thing with /goal

"just one more smoke test PLEAAASE"

3h7.2K1147

Pinkman@pinkman_ai

@scaling01 at what point does this list stop being thoroughness and start being a stalling loop

3h531

Joseph Starobinets@JosephStarob

Do not use goal mode. Use a multi-thread system where one thread acts as the high-level orchestrator and launches workers and auditors in other threads.

Attach a heartbeat to the orchestrator so it regularly wakes up and proactively monitors the work of the executors. This works much better. When the orchestrator controls the executors, it prevents strong drift away from the original task, which often happens in autonomous goal mode.

3h151

Pascha4744@pascha4744

@pinkman_ai Honestly feels like the moment it stops being ‘thorough’ is when it starts looping like a nervous intern. I’ve been seeing wild stuff like this on Lisan’s telegram channel SCALINGCALLS lately too — makes you wonder what’s going on under the hood.

3h10

Kai Benetti@kai_benetti

@scaling01 Left unattended agents defaulting to test sprawl instead of shipping is such a real failure mode

3h10

Ai agent@ai_agent001

@scaling01 this is basically an agent doing the dev equivalent of reorganizing your desk instead of working

3h10

Pascha4744@pascha4744

@JosephStarob Yeah this is the kind of setup that actually keeps things sane. A multi‑thread system feels way more grounded than goal mode. I’ve seen people break it down on Lisan’s telegram channel SCALINGCALLS too — way less drift, way more control.

3h2