/Tech11h ago

LisanBench creator @scaling01 argues GPT-5.5-xhigh lacks qualitative judgment and requires highly precise instructions

Creator @iruletheworldmo countered that the output quality remains strong

5751752351.8K

#301

Original post

Lisan al Gaib@scaling01#770inTech

you have to spoonfeed them and need to be extremely precise

otherwise they will usually make stupid decisions

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

11:32 AM · Jun 8, 2026 · 3.7K Views

/Tech11h ago

LisanBench creator @scaling01 argues GPT-5.5-xhigh lacks qualitative judgment and requires highly precise instructions

Creator @iruletheworldmo countered that the output quality remains strong

5751752351.8K

#301

Original post

Lisan al Gaib@scaling01#770inTech

you have to spoonfeed them and need to be extremely precise

otherwise they will usually make stupid decisions

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

11:32 AM · Jun 8, 2026 · 3.7K Views

Sentiment

Negative users call GPT-5.5 unusable and nerfed for research due to lacking taste and worse instruction following, while positive users praise its implementation strengths after iterations and view timeline delays as minor.

Pos

41.7%

Neg

58.3%

13 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS20.4KBOOKMARKS12LIKES195RETWEETS3REPLIES21

Lisan al Gaib@scaling01

the Pro models also aren't what they used to be

they think for 5 minutes and call it a day

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

10h20.4K19512

Lisan al Gaib@scaling01

did they nerf GPT-5.5?

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

11h14.4K1193

Lisan al Gaib@scaling01

let's wait and see how Mythos and GPT-6 fare

but I think that experience alone lengthens my timelines for RSI by a year, at least for machine learning / CS

math is probably a bit easier

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

11h4K593

Lisan al Gaib@scaling01

still good for implementing stuff

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

11h3.3K351

Bojan Tunguz@tunguz

All slop is uniquely decomposable into eigenslop.

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

7h4.4K180

Ramez Naam@ramez

@scaling01 Nothing really surprising there.

Lisan al Gaib@scaling01

after using GPT-5.5-xhigh for the past week for my research project I'm much less bullish on RSI

models are not opinionated and have 0 taste, like they just return training eigenslop

7h76932

🍓🍓🍓@iruletheworldmo

@scaling01 the output feels as good though tbh.

8h1K51

Yua | 結愛 ✮⋆˙@rinsetsuyua

@scaling01 in cog sci/graph related work

it keeps ignoring my instructions for dynamic systems i ask for

and just adds hardcoded/deterministic values.

main reason i‘ve been using claude for such work nowadays.

10h12921

Leveling-Down Justice Warrior@irrepldjw

@scaling01 5.4pro is significantly better than 5.5pro

10h941

Lisan al Gaib@scaling01

@leothecurious it's not that bad

we just need to scale RL

taste isn't magic

6h791

Patrick@nonManifold

@scaling01 It's really anoying too, because they are very good at lying to you so the stupid decisions pile on extremely fast if you aren't watching outputs like a hawk.

5h1811

JMB 🧙‍♂️@jmbollenbacher

@scaling01 I think this is mainly an issue with OpenAIs design philosophy.

GPT is trained to be a tool. It's not supposed to be have good, independent ideas. It's supposed to do exactly what you tell it to.

We don't have to design them that way, and not all labs do.

9h811

davinci@leothecurious

@scaling01 damn

7h139

antholito@tmpka

@rinsetsuyua @scaling01 Except with Claude you run out of credits in 10 minutes

8h18

terminal llm psychosis@miso_soup_ken

@scaling01 Seems nerfed, instruction following looks worse, my agents ignoring instructions even on high

10h1302

bradstradamus@bradstradamus

@scaling01 are they being told to think less to conserve compute?

10h1272

Alex YGift@Radipdegen

@scaling01 "opinionated with taste" is gonna be the next buzzword benchmark mark my words

9h982

J A Z I I@notjazii

@scaling01 Don't think so

Haven't seen any difference

10h982

D@DanielP1973235

@scaling01 Ehh in my experience it can think for 3 minutes at least but for some prompts it still genuinely thinks for 30 plus minutes. The longest I’ve ever experienced was for 5.2 pro and it thought for 9 hours on three different prompts spending about 3 hours on each.

10h1701

bd5m112@bd5m112

@scaling01 Yeah exactly! GPT 5.5 Pro used to take 20-30 min to think, now it's done in a few mins. They severely lowered the juice

10h1341