/Tech27d ago

Blinded study finds law professors prefer Gemini 2.5 Pro answers to human peer responses 75.33% of the time

AI was flagged harmful 3.53% of the time versus 12.06%.

--0--

#16

Original post

Andrew Curran@AndrewCurran_#682inTech

In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.

11:02 AM · Jun 2, 2026 · 270.2K Views

Sentiment

Users praised a Stanford study showing law professors prefer Gemini 2.5 Pro answers for strong legal drafting and explanations, while others called the results embarrassing for the profession.

Pos

77.2%

Neg

22.8%

31 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

SSRN.COMVia

Posts from X

Most Activity

VIEWS130.2KREPLIES43

Susan Zhang@suchenzang

absolutely tickled by how these llms currently seem to work better for lawyer-y tasks than they do for my friend who's been ghostwriting smut for the last 15+ years

specifically, she says:

> biggest pain point is the fact that output matters for writing but coders just care that the code works

> coders are fine with repetitive elements and inefficiently written statements, but when writing, the repetitive patterns turn into limericks after 300 words

> which then requires me to either prevent this at the input level by heavily seeding the inputs with non slop writing patterns or almost go through entire rewrites on the outputs

> the former means a 100k book might require 50k words of scaffolding

> the latter means i'm probably rewriting 75k by hand

and her comment today:

Andrew Curran@AndrewCurran_

In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.

24d130.2K683257

BOOKMARKS291LIKES806RETWEETS120

Ethan Mollick@emollick

Law professors wrote questions they were asked during office hours. Gemini 2.5 & humans answered them then other law professors blindly judged the results: -Gemini had a 75% win rate vs. professors -Gemini's answers were rated LESS harmful than humans -Newer models do even better

Andrew Curran@AndrewCurran_

In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.

27d87.8K806291

Andrew Curran@AndrewCurran_

In a separate test Opus 4.7 showed how far the gap has widened since then.

Andrew Curran@AndrewCurran_

In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.

27d6.4K13820

Rohan Paul@rohanpaul_ai

Stanford researchers found that law professors preferred AI answers over peer professor answers 75% of the time when judging contract-law help for students.

The study tested whether LLMs can handle a field where the answer is often not a fact, but a defensible argument built from rules, exceptions, and judgment.

The professors wrote 40 real student-style questions, gave their own answers, and then blindly judged nearly 3,000 comparisons between human and AI responses.

The striking result was not just that AI won often, but that professors marked AI answers as harmful only 3.5% of the time, compared with 12% for human answers.

i.e. the model was not merely sounding fluent, but often matching the teaching standard law professors use when explaining ambiguity to students.

27d5.9K6323

Christopher Manning@chrmanning

Congrats to @JulianNyarko & team on this work!

I’m not wrong in fearing that Claude or ChatGPT’s answers to technical questions are often more up-to-date, complete and balanced than my own…

How will knowledge workers & society react to the diminishment of human expertise? 🤔

Julian Nyarko@JulianNyarko

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6849678

26d7.3K3417

Susan Zhang@suchenzang

some more info on her usage:

> my current theory/test is to have the first 10k words establish patterns in dialogue and prose, and then setting up blocks within a chapter that i then prompt the LLM to write

> so that as the process goes deeper into the book, i'm slowly handing off the writing to the LLM but with me still in the director chair

> but i'm sick and tired of seeing endless "it's not X, it's Y" or "X lands and I file it away" or "I want to do X, I do not do X" within 3 sentences of each other

Susan Zhang@suchenzang

absolutely tickled by how these llms currently seem to work better for lawyer-y tasks than they do for my friend who's been ghostwriting smut for the last 15+ years

specifically, she says:

> biggest pain point is the fact that output matters for writing but coders just care that the code works

> coders are fine with repetitive elements and inefficiently written statements, but when writing, the repetitive patterns turn into limericks after 300 words

> which then requires me to either prevent this at the input level by heavily seeding the inputs with non slop writing patterns or almost go through entire rewrites on the outputs

> the former means a 100k book might require 50k words of scaffolding

> the latter means i'm probably rewriting 75k by hand

and her comment today:

24d6.1K6712

Jerry Liu@jerryjliu0

shower thought

If: 1. AI is smarter than humans at law, therapy, etc. 2. Humans still like talking to other humans.

Then: Humans are just an AI wrapper. Everyone should just regurgitate what Claude tells them in real time.

Long Cluely

Polymarket@Polymarket

NEW: Stanford study finds law professors preferred AI-generated tutoring answers over professor-written answers 75% of the time.

27d6.2K294

vik@vikhyatk

@suchenzang i like elegant code but i was forced to work with other programmers early in my career so i learned to accept poorly written code as long as it works

24d2.1K32

Andrew Curran@AndrewCurran_

@PlastiqSoldier They did another test that shows the gap.

27d996191

(((ل()(ل() 'yoav))))👾@yoavgo

@suchenzang seems like she has "taste". (this thing we are about to collectively lose in a few years)

Susan Zhang@suchenzang

absolutely tickled by how these llms currently seem to work better for lawyer-y tasks than they do for my friend who's been ghostwriting smut for the last 15+ years

specifically, she says:

> biggest pain point is the fact that output matters for writing but coders just care that the code works

> coders are fine with repetitive elements and inefficiently written statements, but when writing, the repetitive patterns turn into limericks after 300 words

> which then requires me to either prevent this at the input level by heavily seeding the inputs with non slop writing patterns or almost go through entire rewrites on the outputs

> the former means a 100k book might require 50k words of scaffolding

> the latter means i'm probably rewriting 75k by hand

and her comment today:

24d1.1K260

Andrew Curran@AndrewCurran_

@AIMelGibson The studies usually have a significant lag.

27d1K20

Plastic Soldier@PlastiqSoldier

@AndrewCurran_ *sees Gemini 2.5* This study is obviously outdated and worthless since it uses an out-of-date model. *sees results* Oh, oh, this is pretty impressive.

27d1.1K19

Rohan Paul@rohanpaul_ai

https://law.stanford.edu/publications/law-professors-prefer-ai-over-peer-answers/

27d1.6K51

Robert Anderson@ProfRobAnderson

@emollick @JulianNyarko My fellow law professors:

27d4919

prinz@deredleritt3r

@AndrewCurran_ I find this graph to be quite strange. Gemini-2.5 Pro better than Gemini-3.1 Pro? Hmm.

27d4416

Justin Zhao@justinxzhao

@suchenzang Relevant: https://github.com/sam-paech/antislop-sampler

24d37922

Latent Node@latent_node

@suchenzang You need to treat writing as code and do it in claude code, it actually works well, You can plan ahead and create plotline, characters, tone etc. and you can ensure the agent adheres to them.

24d77541

Sean McClure@sean_a_mcclure

@AndrewCurran_ If the self-driving car takes the job of the taxi driver but crashes less often, ask yourself what the ethical decision is.

27d24441

Spencer Dusebout@sdusebout

@emollick Law is kind of like coding with all the case law and consensus on certain patterns. Great use case for AI. Going to pressure the paper pushers for sure, but good lawyers are true problem solvers and have the experience AI won't be able to easily train.

27d6367

Matthew Zirwas, MD@MattZirwas

But many lawyers have assured me, with a high degree of certainty, that AI is many years away from replacing lawyers for anything other than routine document prep and contract review, maybe some discovery.

Who to believe: the people who actually investigated the question or the ones whose ego and income is in the cross-hairs.

Amazingly, people in medicine may be even worse.

27d6641