The Cybernetic Teammate experiment shows AI-assisted individuals match unaided team output quality
The Cybernetic Teammate field experiment randomly assigned professionals to tasks individually or in teams with or without AI assistance. Individuals using AI reached standardized quality scores near 0.39, matching the 0.38 achieved by unaided teams and exceeding the 0.25 from individuals working alone. The study employed GPT-4 and GPT-4o models and found gains mainly outside participants core expertise.
And, yes, our experiments used a mix of GPT-4 & GPT-4o (publishing takes awhile). I think we would see much larger results with more recent models, let alone recent agentic tools.
"The Cybernetic Teammate" is a fascinating field experiment by a superstar team of researchers, including @raffasadun @emollick 💡The bottom-line: This field experiment suggests that one way in which AI can yield productivity benefits is not dissimilar from the way in which team production among humans can boost performance: by providing above-average performance in the tasks where workers have limited skills. This points to lessons about the contexts in which AI productivity gains should be relatively greater. 🧱 Randomly assign professionals at a large company to work either with or without AI, and either individually or teamed up with another human colleague. 👉Individuals with AI matched the performance of human teams without AI. In particular, while individuals working alone tend to produce “unbalanced” solutions that favor their individual expertise. 🧪My reading (through the lens of the theory of Superstar Teams I've worked on): The "AI as a teammate" perspective also yields testable predictions regarding the conditions under which AI can boost productivity: AI productivity gains should be greater when (i) humans are more specialized (i.e. they're productivity varies across the tasks bundled together into their job), (ii) AI capabilities are negatively correlated with human skills (i.e., best at the tasks the human/humans are worst at), and (iii) their overall capability is comparable (to avoid weak-link effects).