Stanford researchers found that law professors preferred AI answers over peer professor answers 75% of the time when judging contract-law help for students.
The study tested whether LLMs can handle a field where the answer is often not a fact, but a defensible argument built from rules, exceptions, and judgment.
The professors wrote 40 real student-style questions, gave their own answers, and then blindly judged nearly 3,000 comparisons between human and AI responses.
The striking result was not just that AI won often, but that professors marked AI answers as harmful only 3.5% of the time, compared with 12% for human answers.
i.e. the model was not merely sounding fluent, but often matching the teaching standard law professors use when explaining ambiguity to students.