/Tech1d ago

Anjali argues AI labs must build software to harvest interactive "Work Data" as scaling laws hit data limits

These interaction traces train agents via reinforcement learning

1169163210.2K
Original postHerbie Bradley#1081
anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

7:41 AM · Jun 9, 2026 · 11.3K Views
Sentiment

Many users are excited about the new primer on work data for training AI agents because of its insightful content, collaborative reviews, and playful emphasis on the core concept.

Pos
100.0%
Neg
0.0%
6 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS996
anjali@anjali_shriva

For most domains, real work is the only environment that useful data can come from.

A big reason why labs are building products, acquiring companies, and forwardly deploying engineers into enterprises is to gather enough work data to train their agents on a wider range of tasks.

1dViews 996Likes 9
BOOKMARKS2
jihad@jaesmail

@anjali_shriva two most anticipated pieces of media of 2026: - This essay - Iceman

1dViews 143Likes 3Bookmarks 2
LIKES14
anjali@anjali_shriva

with thanks to the many people who reviewed and gave comments

@divya_venn @annihalated @aishdoingthings @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0

and @analoguegroup

🫶

1dViews 116Likes 14
RETWEETS16
anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

1dViews 11.3KLikes 70Bookmarks 33
REPLIES2
Aashish Reddy@_AashishReddy

@anjali_shriva @ankit2119 My take is that what an agent needs to learn is basically representations for how to take actions in the world, how to chain together sequences of actions and make plans and so on. So synthetic data makes RL-generalisation feasible

1dViews 19Likes 2
anjali@anjali_shriva

if work data is what matters, where do you get it?

you can't just scrape it from the web. Work data is a fundamentally different distribution, and the corrective signals that matter aren’t in any textbook, manual, or written wiki.

1dViews 160Likes 11Bookmarks 1
anjali@anjali_shriva

But once you understand it, it raises many, many questions about the future.

Read the full post for our predictions (co-written with @joodalooped) 👉 http://anjalishriva.com/work-data

1dViews 85Likes 9Bookmarks 1
anjali@anjali_shriva

Okay, no dataset. Can't we just build a simulation? A "work gym" (RL env) where agents learn by trial and error?

unfortunately, knowledge work lacks the verifiability that RL relies on: the feedback is too sparse, too delayed, too noisy to learn from (h/t @gwern)

1dViews 83Likes 9
anjali@anjali_shriva

It's hard to grasp how much work data you generate in a session, let alone the sheer scale of data that's needed

Our examples of its specific nature / excerpt from recent @dwarkesh_sp post:

1dViews 64Likes 9
aishwarya🍎@aishdoingthings

@anjali_shriva @divya_venn @annihalated @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 @analoguegroup WORK DATA!!!!!

1dViews 41Likes 6
Analogue@analoguegroup

@anjali_shriva @divya_venn @annihalated @aishdoingthings @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 we love supporting your work!

1dViews 38Likes 8
Aashish Reddy@_AashishReddy

@anjali_shriva Are you bearish on synthetic data

1dViews 97Likes 2
anjali@anjali_shriva

@jaesmail i'm fr laughing at how long it took

what can i say, we enjoy the finishing touches

1dViews 47Likes 2
anjali@anjali_shriva

@_AashishReddy for anti-inductive domains, yeah. and i think this is a *huge* portion of white collar work

good writeup from @ankit2119 https://ankitmaloo.com/anti-inductive/

1dViews 34Likes 2
Soren Larson@hypersoren

@anjali_shriva @jaesmail 🥲

1dViews 9Likes 2
Aashish Reddy@_AashishReddy

@anjali_shriva @ankit2119 Whereas it wouldn't for learning models of the world, since synthetic data doesn't tell you what the world is like. But fortunately we already get that from pretraining. I still have exams but hope to have time to flesh this out before the singularity

1dViews 23Likes 1
anjali@anjali_shriva

@aishdoingthings @divya_venn @annihalated @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 @analoguegroup #werk data

1dViews 26Likes 4
judah@joodalooped

@hypersoren @anjali_shriva @jaesmail sometimes it includes a CSS rewrite

1dViews 6Likes 2
harsh@harshh_jainn

@anjali_shriva would be hilarious if the models got dumber after post training on work data 😅

1dViews 18Likes 1
anjali@anjali_shriva

@_AashishReddy @ankit2119 i can see this, yeah. the full post is a bit more nuanced and comes with an author's note

we mainly wanted to get across

1) this type of data is important, even for compute-rich labs (see labs starting deployment companies, cursor-xai partnership) 2) and it's under-discussed

1dViews 19
Load more posts