1h ago

Shreya Shankar, a databases and HCI researcher, finds AI agents converge on superficial interpretations and fail to adapt when given gradual human feedback on qualitative tasks such as tweet sensemaking

Participants reported high fatigue during iterative feedback rounds.

2
Original post

link to post: https://www.sh-reya.com/blog/ai-qual-analysis/ i also included interactive traces/codebooks for all the experimental conditions so people can inspect the workflows step-by-step themselves: https://www.sh-reya.com/blogimages/ai-qual-analysis/transcripts.html

10:06 AM · May 22, 2026 View on X
Reposted by

i'm restarting my blog! i want to kickstart productive conversations around: what should AI agents look like for hard, subjective knowledge work?

a lot of agent setups work well when tasks are objective and easy to verify. but many workflows (e.g., qualitative analysis, strategy, sensemaking) are messy and interpretive.

as a first post, i explore different ways of doing agent-assisted qualitative analysis on tweets, with varying levels of human feedback/intervention.

tldr: they all kinda sucked. turns out it’s hard to: (a) stop agents from converging too quickly on shallow interpretations (b) get agents to adapt to preferences that emerge gradually across many turns (i.e., evolving context) (c) capture human judgment without making humans fatigued

5:06 PM · May 22, 2026 · 4.4K Views

The experiments conducted in this post illustrate how early we are as an industry on eval tooling.

Some takeaways and related thoughts:

1. Naively applying automation (which many current frameworks do) is likely to fail.

2. It's easy to get fooled that automation (esp overzealous automation) is giving you valuable insights. Stay skeptical at all times!

3. We have to design eval workflows so human-in-the-loop accelerates effort while helping you externalize what "good looks like"

4. Qualitative analysis hasn't sufficiently made its way into eval tooling as much as it should. There are opportunities to design better automation here. (QA is super underrated for evals btw)

Shreya ShankarShreya Shankar@sh_reya

i'm restarting my blog! i want to kickstart productive conversations around: what should AI agents look like for hard, subjective knowledge work? a lot of agent setups work well when tasks are objective and easy to verify. but many workflows (e.g., qualitative analysis, strategy, sensemaking) are messy and interpretive. as a first post, i explore different ways of doing agent-assisted qualitative analysis on tweets, with varying levels of human feedback/intervention. tldr: they all kinda sucked. turns out it’s hard to: (a) stop agents from converging too quickly on shallow interpretations (b) get agents to adapt to preferences that emerge gradually across many turns (i.e., evolving context) (c) capture human judgment without making humans fatigued

5:06 PM · May 22, 2026 · 4.4K Views
5:24 PM · May 22, 2026 · 1.7K Views
Shreya Shankar, a databases and HCI researcher, finds AI agents converge on superficial interpretations and fail to adapt when given gradual human feedback on qualitative tasks such as tweet sensemaking · Digg