Stanford's Vishakh Padmakumar introduces Offloading Score, a metric to quantify how much cognitive effort users delegate to AI

VIEWS14.7KBOOKMARKS36LIKES93REPLIES1

We propose a new way to quantify AI overreliance: the Offloading Score 🧐 @vishakh_pk

It measures the fraction of cognitive work you hand off to AI 🤖 via simulating how you'd have done each step without AI, then counting the steps the AI saved. It works directly from interaction traces (keystrokes, screenshots), so it's reusable across many tools!!

Vishakh Padmakumar@vishakh_pk

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

24d14.7K9336

RETWEETS46

Vishakh Padmakumar@vishakh_pk

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

26d74.8K20599

Zora Wang@ZhiruoW

We are building AI technologies to empower humans, and this requires awareness of human reliance. Our latest work measures human cognitive offload using our workflow induction toolkit. Beyond showing the accuracy of our measure, we find that high reliance isn't inherently harmful. When users bring intentional engagement and genuine task understanding, AIs can facilitate human learning ✨

Vishakh Padmakumar@vishakh_pk

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

26d3.7K3313

Nishant Balepur@NishantBalepur

I always wished tools like Cursor, CC, etc. could detect and tell me when im being too lazy 🙏

Awesome work by the awesome @vishakh_pk towards this!

Vishakh Padmakumar@vishakh_pk

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

26d3.6K196

Vishakh Padmakumar@vishakh_pk

Paper: https://arxiv.org/abs/2605.29392 Code: https://github.com/vishakhpk/offloading-score Some examples and concise findings: https://vishakhpk.github.io/measuring-reliance/

With my amazing co-authors! @lujainmibrahim @ZhiruoW @jennjwang @QVeraLiao @Diyi_Yang

More below!🧵 (2/9)

26d688103

Vishakh Padmakumar@vishakh_pk

We introduce Offloading Score. The idea is simple: we examine the AI-assisted steps of the user’s workflow, estimate how the user would have completed it without the tool, then compare how many steps from the counterfactual workflow were ‘saved’ by using the tool. (4/9)

Vishakh Padmakumar@vishakh_pk

Two users might spend the same amount of time interacting with an AI assistant and generate the same amount of code with its help, yet one may be delegating substantially more of the task. Measures based on AI usage alone are no longer sufficient for contemporary AI-assisted workflows. (3/9)

26d836111

Vishakh Padmakumar@vishakh_pk

Offloading Score reliably recovers this relationship, assigning significantly higher reliance to users under time pressure (short vs long below!). Other common measures, including AI interactions, lines of code, and self-reported measures, do not reliably distinguish between the two conditions. (6/9)

26d21161

Vishakh Padmakumar@vishakh_pk

High reliance is not always undesirable. We examine the interaction between reliance and a desirable task outcome, code understanding. While in-general high reliance leads to low code understanding, we also find a cluster of high-reliance + high-understanding users that are often _learning_ with AI and augmenting their own skills. This suggests that reliance should be interpreted alongside task outcomes, not in isolation. (8/9)

26d4147

Vishakh Padmakumar@vishakh_pk

To evaluate if Offloading-Score captures reliance effectively, we need a setting where reliance can be predictably manipulated. For this, we ran a controlled study with developers using AI coding tools under different levels of time pressure. Prior work suggests that people rely more heavily on decision aids under time pressure (5/9)

26d2316

Vishakh Padmakumar@vishakh_pk

Two users might spend the same amount of time interacting with an AI assistant and generate the same amount of code with its help, yet one may be delegating substantially more of the task. Measures based on AI usage alone are no longer sufficient for contemporary AI-assisted workflows. (3/9)

26d4265

Vishakh Padmakumar@vishakh_pk

Beyond the scalar value of Offloading Score, we also obtain richer behavioral insights from the workflows, finding that under time pressure, users shift toward execution-oriented interactions, delegate more subtasks to the AI, and directly reuse AI-generated outputs more often. (7/9)

26d1835

Vishakh Padmakumar@vishakh_pk

Offloading Score is calculated directly from screen and keyboard data, making it reusable across _any_ AI tool and interface. By making reliance measurable, we hope to enable more systematic study of overreliance, deskilling, and loss of agency in human–AI collaboration. (9/9)

26d1626

Andrew Dobrow@andrewdobrow

From a very anecdotal point of view, I can confirm that in some ways, my thinking process has changed or has become externally lazier, such as writing my own content and even spelling words correctly. I'm 100% working more than ever, but my brain is working differently than it was.

26d941

Prithvi Jadwani | AI SEO | GEO | REDDIT SEO | GMB@Prithvi_Jadwani

@ZhiruoW So, it boils down to whether users can genuinely learn from the process or just outsource their thinking.

26d151

Vishakh Padmakumar@vishakh_pk

@andrewdobrow @ZhiruoW @Diyi_Yang @QVeraLiao @jennjwang This resonates a lot with our motivation! I think it's inevitable that our behavior patterns change as we engage more with AI tools. Our goal with Offloading Score is to provide a way to analyze those changes and make more informed decisions about how we work.

26d94

Zora Wang@ZhiruoW

@Prithvi_Jadwani imo is not entirely on the users. It also depends on how we build AI tools and how much user learning/upskilling they afford.

26d121

@saba_1121@S4B4_1121

@vishakh_pk @ZhiruoW @Diyi_Yang @QVeraLiao @jennjwang I recently released an #npm #package for #token compression and prompt optimisation. The package works with #HuggingFace models to estimate token probabilities. The goal is to reduce #tokenAPI costs. I'd really appreciate🌟on GitHub. https://github.com/SaBA26-void/project_void

26d40