
@megamor2 Can you say more about the issues you are seeing in the LLM-generated reviews?
Critics say automated reviews suffer from fluff and hallucinations.
Users criticize LLM-generated academic peer reviews for being fluffy with little real information plus hallucinations and overstated flaws, preferring to train human PhD students instead.

@megamor2 Can you say more about the issues you are seeing in the LLM-generated reviews?

Reviews can seem very detailed but in practice there's little information there. The summary is often full of fluff so it's really just hard to understand what the paper is about. They often provide long lists of issues, and it's very hard to understand if the concerns are major or minor. They often make non realistic or nonsensical suggestions. All this without mentioning mistakes and hallucinations, which are often conveyed with such confidence that can bias the whole evaluation. The final scores are often borderline so no information there as well.
As author, it creates lots of work which is either stupid or not feasible. As AC it's a nightmare, I want to get a concrete evaluation to work with and I get this noisy not informative text. In rebuttals things become a joke.. the person who generated the review has no idea how to judge what's going on and so typically you get a short statement like "I read the response and decided to keep my score".

@ipeirotis @TuhinChakr @megamor2 But ACs typically can see who reviewers are and thus know when a reviewer is junior and factor that in (I'd like to think junior reviewers also generally give themselves lower confidence scores). With undisclosed LLM use, that's not the case.

@ipeirotis @megamor2 Lol we should train them instead of thinking AI is the solution ;) I am doing my part training my first PhD students

@TuhinChakr @megamor2 We have the same problem with reviews that come from junior faculty and PhD students 😉.

@megamor2 What is the most insightful public discussion notes or conversations in this topic?

In general, junior people tend to be much stricter and return reviews that are more “nitpicky”.
Editors can handle such “nitpick objections relatively easily, especially for journals. For sure much easier to deal with a picky LLM who will not be hurt by having their objections overruled.

I recently reviewed for a workshop, and the other two reviews on the paper I was assigned seem kinda oblivious to some of the shortfalls that popped out to me. I don't know if they used LLMs, because both of the reviews read quite similar-ish.
Admittedly, I am not an experienced reviewer, and it's very likely that one of the things I pointed out was obvious, but it still felt kinda funky.

@megamor2 @delliott @ipeirotis