In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.
Blinded study finds law professors prefer Gemini 2.5 Pro answers to human peer responses 75.33% of the time
AI was flagged harmful 3.53% of the time versus 12.06%.
Users praised a Stanford study showing law professors prefer Gemini 2.5 Pro answers for strong legal drafting and explanations, while others called the results embarrassing for the profession.
No Digg Deeper questions have been answered for this story yet.
Most Activity
absolutely tickled by how these llms currently seem to work better for lawyer-y tasks than they do for my friend who's been ghostwriting smut for the last 15+ years
specifically, she says:
> biggest pain point is the fact that output matters for writing but coders just care that the code works
> coders are fine with repetitive elements and inefficiently written statements, but when writing, the repetitive patterns turn into limericks after 300 words
> which then requires me to either prevent this at the input level by heavily seeding the inputs with non slop writing patterns or almost go through entire rewrites on the outputs
> the former means a 100k book might require 50k words of scaffolding
> the latter means i'm probably rewriting 75k by hand
and her comment today:
In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.
Law professors wrote questions they were asked during office hours. Gemini 2.5 & humans answered them then other law professors blindly judged the results: -Gemini had a 75% win rate vs. professors -Gemini's answers were rated LESS harmful than humans -Newer models do even better
In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.
In a separate test Opus 4.7 showed how far the gap has widened since then.
In a new Stanford study, law professors by far preferred Gemini 2.5 Pro's responses over those written by their peers when they were unaware of who wrote the answers.
Stanford researchers found that law professors preferred AI answers over peer professor answers 75% of the time when judging contract-law help for students.
The study tested whether LLMs can handle a field where the answer is often not a fact, but a defensible argument built from rules, exceptions, and judgment.
The professors wrote 40 real student-style questions, gave their own answers, and then blindly judged nearly 3,000 comparisons between human and AI responses.
The striking result was not just that AI won often, but that professors marked AI answers as harmful only 3.5% of the time, compared with 12% for human answers.
i.e. the model was not merely sounding fluent, but often matching the teaching standard law professors use when explaining ambiguity to students.
Congrats to @JulianNyarko & team on this work!
I’m not wrong in fearing that Claude or ChatGPT’s answers to technical questions are often more up-to-date, complete and balanced than my own…
How will knowledge workers & society react to the diminishment of human expertise? 🤔
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6849678
some more info on her usage:
> my current theory/test is to have the first 10k words establish patterns in dialogue and prose, and then setting up blocks within a chapter that i then prompt the LLM to write
> so that as the process goes deeper into the book, i'm slowly handing off the writing to the LLM but with me still in the director chair
> but i'm sick and tired of seeing endless "it's not X, it's Y" or "X lands and I file it away" or "I want to do X, I do not do X" within 3 sentences of each other
absolutely tickled by how these llms currently seem to work better for lawyer-y tasks than they do for my friend who's been ghostwriting smut for the last 15+ years
specifically, she says:
> biggest pain point is the fact that output matters for writing but coders just care that the code works
> coders are fine with repetitive elements and inefficiently written statements, but when writing, the repetitive patterns turn into limericks after 300 words
> which then requires me to either prevent this at the input level by heavily seeding the inputs with non slop writing patterns or almost go through entire rewrites on the outputs
> the former means a 100k book might require 50k words of scaffolding
> the latter means i'm probably rewriting 75k by hand
and her comment today:
shower thought
If: 1. AI is smarter than humans at law, therapy, etc. 2. Humans still like talking to other humans.
Then: Humans are just an AI wrapper. Everyone should just regurgitate what Claude tells them in real time.
Long Cluely
NEW: Stanford study finds law professors preferred AI-generated tutoring answers over professor-written answers 75% of the time.

@suchenzang i like elegant code but i was forced to work with other programmers early in my career so i learned to accept poorly written code as long as it works

@PlastiqSoldier They did another test that shows the gap.
@suchenzang seems like she has "taste". (this thing we are about to collectively lose in a few years)
absolutely tickled by how these llms currently seem to work better for lawyer-y tasks than they do for my friend who's been ghostwriting smut for the last 15+ years
specifically, she says:
> biggest pain point is the fact that output matters for writing but coders just care that the code works
> coders are fine with repetitive elements and inefficiently written statements, but when writing, the repetitive patterns turn into limericks after 300 words
> which then requires me to either prevent this at the input level by heavily seeding the inputs with non slop writing patterns or almost go through entire rewrites on the outputs
> the former means a 100k book might require 50k words of scaffolding
> the latter means i'm probably rewriting 75k by hand
and her comment today:

@AIMelGibson The studies usually have a significant lag.

@AndrewCurran_ *sees Gemini 2.5* This study is obviously outdated and worthless since it uses an out-of-date model. *sees results* Oh, oh, this is pretty impressive.

https://law.stanford.edu/publications/law-professors-prefer-ai-over-peer-answers/

@emollick @JulianNyarko My fellow law professors:

@AndrewCurran_ I find this graph to be quite strange. Gemini-2.5 Pro better than Gemini-3.1 Pro? Hmm.

@suchenzang Relevant: https://github.com/sam-paech/antislop-sampler

@suchenzang You need to treat writing as code and do it in claude code, it actually works well, You can plan ahead and create plotline, characters, tone etc. and you can ensure the agent adheres to them.

@AndrewCurran_ If the self-driving car takes the job of the taxi driver but crashes less often, ask yourself what the ethical decision is.

@emollick Law is kind of like coding with all the case law and consensus on certain patterns. Great use case for AI. Going to pressure the paper pushers for sure, but good lawyers are true problem solvers and have the experience AI won't be able to easily train.

But many lawyers have assured me, with a high degree of certainty, that AI is many years away from replacing lawyers for anything other than routine document prep and contract review, maybe some discovery.
Who to believe: the people who actually investigated the question or the ones whose ego and income is in the cross-hairs.
Amazingly, people in medicine may be even worse.