/AI11h ago

Anjali argues AI labs must build software to harvest interactive "Work Data" as scaling laws hit data limits

These interaction traces train agents via reinforcement learning

116716288.4K
Original postHerbie Bradley#1012
anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

7:41 AM · Jun 9, 2026 · 8.4K Views
Sentiment

Many users are excited about the new primer on work data for training AI agents because of its insightful content, collaborative reviews, and playful emphasis on the core concept.

Pos
100.0%
Neg
0.0%
6 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS996
anjali@anjali_shriva

For most domains, real work is the only environment that useful data can come from.

A big reason why labs are building products, acquiring companies, and forwardly deploying engineers into enterprises is to gather enough work data to train their agents on a wider range of tasks.

11hViews 996Likes 9
BOOKMARKS2
jihad@jaesmail

@anjali_shriva two most anticipated pieces of media of 2026: - This essay - Iceman

11hViews 143Likes 3Bookmarks 2
LIKES14
anjali@anjali_shriva

with thanks to the many people who reviewed and gave comments

@divya_venn @annihalated @aishdoingthings @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0

and @analoguegroup

🫶

11hViews 116Likes 14
RETWEETS16
anjali@anjali_shriva

the scaling laws in models might feel like inevitable progress if compute and data continue growing. but data has some underrated limitations…

a thread on a new kind of data ("Work Data"): what it is, and why labs now need to build and sell product for continued growth

judah@joodalooped

all aboard the data train!

https://anjalishriva.com/work-data/

11hViews 8.4KLikes 67Bookmarks 28
REPLIES2
Aashish Reddy@_AashishReddy

@anjali_shriva @ankit2119 My take is that what an agent needs to learn is basically representations for how to take actions in the world, how to chain together sequences of actions and make plans and so on. So synthetic data makes RL-generalisation feasible

10hViews 19Likes 2
anjali@anjali_shriva

if work data is what matters, where do you get it?

you can't just scrape it from the web. Work data is a fundamentally different distribution, and the corrective signals that matter aren’t in any textbook, manual, or written wiki.

11hViews 160Likes 11Bookmarks 1
anjali@anjali_shriva

But once you understand it, it raises many, many questions about the future.

Read the full post for our predictions (co-written with @joodalooped) 👉 http://anjalishriva.com/work-data

11hViews 85Likes 9Bookmarks 1
anjali@anjali_shriva

Okay, no dataset. Can't we just build a simulation? A "work gym" (RL env) where agents learn by trial and error?

unfortunately, knowledge work lacks the verifiability that RL relies on: the feedback is too sparse, too delayed, too noisy to learn from (h/t @gwern)

11hViews 83Likes 9
anjali@anjali_shriva

It's hard to grasp how much work data you generate in a session, let alone the sheer scale of data that's needed

Our examples of its specific nature / excerpt from recent @dwarkesh_sp post:

11hViews 64Likes 9
aishwarya🍎@aishdoingthings

@anjali_shriva @divya_venn @annihalated @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 @analoguegroup WORK DATA!!!!!

10hViews 41Likes 6
Analogue@analoguegroup

@anjali_shriva @divya_venn @annihalated @aishdoingthings @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 we love supporting your work!

10hViews 38Likes 8
Aashish Reddy@_AashishReddy

@anjali_shriva Are you bearish on synthetic data

10hViews 97Likes 2
anjali@anjali_shriva

@jaesmail i'm fr laughing at how long it took

what can i say, we enjoy the finishing touches

10hViews 47Likes 2
anjali@anjali_shriva

@_AashishReddy for anti-inductive domains, yeah. and i think this is a *huge* portion of white collar work

good writeup from @ankit2119 https://ankitmaloo.com/anti-inductive/

10hViews 34Likes 2
Soren Larson@hypersoren

@anjali_shriva @jaesmail 🥲

10hViews 9Likes 2
Aashish Reddy@_AashishReddy

@anjali_shriva @ankit2119 Whereas it wouldn't for learning models of the world, since synthetic data doesn't tell you what the world is like. But fortunately we already get that from pretraining. I still have exams but hope to have time to flesh this out before the singularity

10hViews 23Likes 1
anjali@anjali_shriva

@aishdoingthings @divya_venn @annihalated @peytoncasper @herbiebradley @JoshPurtell @nobu_hibiki @shacrw_ @akbirthko @seconds_0 @analoguegroup #werk data

10hViews 26Likes 4
judah@joodalooped

@hypersoren @anjali_shriva @jaesmail sometimes it includes a CSS rewrite

10hViews 6Likes 2
harsh@harshh_jainn

@anjali_shriva would be hilarious if the models got dumber after post training on work data 😅

10hViews 18Likes 1
anjali@anjali_shriva

@_AashishReddy @ankit2119 i can see this, yeah. the full post is a bit more nuanced and comes with an author's note

we mainly wanted to get across

1) this type of data is important, even for compute-rich labs (see labs starting deployment companies, cursor-xai partnership) 2) and it's under-discussed

10hViews 19
Load more posts