/Tech1d ago

Anjali argues AI labs must build software to harvest interactive "Work Data" as scaling laws hit data limits

These interaction traces train agents via reinforcement learning

1169163210.2K

#972

Original post

Herbie Bradley#1081

anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

7:41 AM · Jun 9, 2026 · 11.3K Views

/Tech1d ago

Anjali argues AI labs must build software to harvest interactive "Work Data" as scaling laws hit data limits

These interaction traces train agents via reinforcement learning

1169163210.2K

#972

Original post

Herbie Bradley#1081

anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

7:41 AM · Jun 9, 2026 · 11.3K Views

Sentiment

Many users are excited about the new primer on work data for training AI agents because of its insightful content, collaborative reviews, and playful emphasis on the core concept.

Pos

100.0%

Neg

0.0%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

anjali@anjali_shriva

For most domains, real work is the only environment that useful data can come from.

A big reason why labs are building products, acquiring companies, and forwardly deploying engineers into enterprises is to gather enough work data to train their agents on a wider range of tasks.

1d9969

BOOKMARKS2

jihad@jaesmail

@anjali_shriva two most anticipated pieces of media of 2026: - This essay - Iceman

1d14332

LIKES14

anjali@anjali_shriva

with thanks to the many people who reviewed and gave comments

@divya_venn @annihalated @aishdoingthings @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0

and @analoguegroup

🫶

1d11614

RETWEETS16

anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

1d11.3K7033

REPLIES2

Aashish Reddy@_AashishReddy

@anjali_shriva @ankit2119 My take is that what an agent needs to learn is basically representations for how to take actions in the world, how to chain together sequences of actions and make plans and so on. So synthetic data makes RL-generalisation feasible

1d192

anjali@anjali_shriva

if work data is what matters, where do you get it?

you can't just scrape it from the web. Work data is a fundamentally different distribution, and the corrective signals that matter aren’t in any textbook, manual, or written wiki.

1d160111

anjali@anjali_shriva

But once you understand it, it raises many, many questions about the future.

Read the full post for our predictions (co-written with @joodalooped) 👉 http://anjalishriva.com/work-data

1d8591

anjali@anjali_shriva

Okay, no dataset. Can't we just build a simulation? A "work gym" (RL env) where agents learn by trial and error?

unfortunately, knowledge work lacks the verifiability that RL relies on: the feedback is too sparse, too delayed, too noisy to learn from (h/t @gwern)

1d839

anjali@anjali_shriva

It's hard to grasp how much work data you generate in a session, let alone the sheer scale of data that's needed

Our examples of its specific nature / excerpt from recent @dwarkesh_sp post:

1d649

aishwarya🍎@aishdoingthings

@anjali_shriva @divya_venn @annihalated @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 @analoguegroup WORK DATA!!!!!

1d416

Analogue@analoguegroup

@anjali_shriva @divya_venn @annihalated @aishdoingthings @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 we love supporting your work!

1d388

Aashish Reddy@_AashishReddy

@anjali_shriva Are you bearish on synthetic data

1d972

anjali@anjali_shriva

@jaesmail i'm fr laughing at how long it took

what can i say, we enjoy the finishing touches

1d472

anjali@anjali_shriva

@_AashishReddy for anti-inductive domains, yeah. and i think this is a *huge* portion of white collar work

good writeup from @ankit2119 https://ankitmaloo.com/anti-inductive/

1d342

Soren Larson@hypersoren

@anjali_shriva @jaesmail 🥲

1d92

Aashish Reddy@_AashishReddy

@anjali_shriva @ankit2119 Whereas it wouldn't for learning models of the world, since synthetic data doesn't tell you what the world is like. But fortunately we already get that from pretraining. I still have exams but hope to have time to flesh this out before the singularity

1d231

anjali@anjali_shriva

@aishdoingthings @divya_venn @annihalated @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 @analoguegroup #werk data

1d264

judah@joodalooped

@hypersoren @anjali_shriva @jaesmail sometimes it includes a CSS rewrite

1d62

harsh@harshh_jainn

@anjali_shriva would be hilarious if the models got dumber after post training on work data 😅

1d181

anjali@anjali_shriva

@_AashishReddy @ankit2119 i can see this, yeah. the full post is a bit more nuanced and comes with an author's note

we mainly wanted to get across

1) this type of data is important, even for compute-rich labs (see labs starting deployment companies, cursor-xai partnership) 2) and it's under-discussed

1d19