People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)
Stanford's Vishakh Padmakumar introduces Offloading Score, a metric to quantify how much cognitive effort users delegate to AI
The metric successfully tracked increased AI reliance under tight deadlines.
Users express resonance with the Offloading Score for measuring AI reliance because it aligns with their motivations around inevitable behavior patterns.
No Digg Deeper questions have been answered for this story yet.
Most Activity
We propose a new way to quantify AI overreliance: the Offloading Score 🧐 @vishakh_pk
It measures the fraction of cognitive work you hand off to AI 🤖 via simulating how you'd have done each step without AI, then counting the steps the AI saved. It works directly from interaction traces (keystrokes, screenshots), so it's reusable across many tools!!
People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)
People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)
We are building AI technologies to empower humans, and this requires awareness of human reliance. Our latest work measures human cognitive offload using our workflow induction toolkit. Beyond showing the accuracy of our measure, we find that high reliance isn't inherently harmful. When users bring intentional engagement and genuine task understanding, AIs can facilitate human learning ✨
People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)
I always wished tools like Cursor, CC, etc. could detect and tell me when im being too lazy 🙏
Awesome work by the awesome @vishakh_pk towards this!
People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

Paper: https://arxiv.org/abs/2605.29392 Code: https://github.com/vishakhpk/offloading-score Some examples and concise findings: https://vishakhpk.github.io/measuring-reliance/
With my amazing co-authors! @lujainmibrahim @ZhiruoW @jennjwang @QVeraLiao @Diyi_Yang
More below!🧵 (2/9)
We introduce Offloading Score. The idea is simple: we examine the AI-assisted steps of the user’s workflow, estimate how the user would have completed it without the tool, then compare how many steps from the counterfactual workflow were ‘saved’ by using the tool. (4/9)
Two users might spend the same amount of time interacting with an AI assistant and generate the same amount of code with its help, yet one may be delegating substantially more of the task. Measures based on AI usage alone are no longer sufficient for contemporary AI-assisted workflows. (3/9)

Offloading Score reliably recovers this relationship, assigning significantly higher reliance to users under time pressure (short vs long below!). Other common measures, including AI interactions, lines of code, and self-reported measures, do not reliably distinguish between the two conditions. (6/9)

High reliance is not always undesirable. We examine the interaction between reliance and a desirable task outcome, code understanding. While in-general high reliance leads to low code understanding, we also find a cluster of high-reliance + high-understanding users that are often _learning_ with AI and augmenting their own skills. This suggests that reliance should be interpreted alongside task outcomes, not in isolation. (8/9)

To evaluate if Offloading-Score captures reliance effectively, we need a setting where reliance can be predictably manipulated. For this, we ran a controlled study with developers using AI coding tools under different levels of time pressure. Prior work suggests that people rely more heavily on decision aids under time pressure (5/9)

Two users might spend the same amount of time interacting with an AI assistant and generate the same amount of code with its help, yet one may be delegating substantially more of the task. Measures based on AI usage alone are no longer sufficient for contemporary AI-assisted workflows. (3/9)

Beyond the scalar value of Offloading Score, we also obtain richer behavioral insights from the workflows, finding that under time pressure, users shift toward execution-oriented interactions, delegate more subtasks to the AI, and directly reuse AI-generated outputs more often. (7/9)

Offloading Score is calculated directly from screen and keyboard data, making it reusable across _any_ AI tool and interface. By making reliance measurable, we hope to enable more systematic study of overreliance, deskilling, and loss of agency in human–AI collaboration. (9/9)

From a very anecdotal point of view, I can confirm that in some ways, my thinking process has changed or has become externally lazier, such as writing my own content and even spelling words correctly. I'm 100% working more than ever, but my brain is working differently than it was.

@ZhiruoW So, it boils down to whether users can genuinely learn from the process or just outsource their thinking.

@andrewdobrow @ZhiruoW @Diyi_Yang @QVeraLiao @jennjwang This resonates a lot with our motivation! I think it's inevitable that our behavior patterns change as we engage more with AI tools. Our goal with Offloading Score is to provide a way to analyze those changes and make more informed decisions about how we work.

@Prithvi_Jadwani imo is not entirely on the users. It also depends on how we build AI tools and how much user learning/upskilling they afford.

@vishakh_pk @ZhiruoW @Diyi_Yang @QVeraLiao @jennjwang I recently released an #npm #package for #token compression and prompt optimisation. The package works with #HuggingFace models to estimate token probabilities. The goal is to reduce #tokenAPI costs. I'd really appreciate🌟on GitHub. https://github.com/SaBA26-void/project_void