/Tech5h ago

Critique Exposes Unreliable False Positives In AI Text Detectors Like Pangram

34908163.9K

Original post unavailable.

Sentiment

Many users criticized the Pangram AI Text Detector for high false-positive rates that cause real harm and mocked its accuracy claims as unrealistic marketing that enables misuse like unwarranted witch-hunting.

Pos

0.0%

Neg

100.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS181LIKES5

Ethan@torchcompiled

Firstly, false positive rate differs wildly by scenario of how the text was created. the 1-in-10,000 metric happens under the most ideal, sterile cases. Real-life scenarios and mixed text have far worse reliability

5h1815

BOOKMARKS1

Ethan@torchcompiled

The witch-hunting and call outs are doing unwarranted damage, and arguably this benefits publicity of the product.

5h3231

REPLIES2

Ethan@torchcompiled

@akhmxt Wait it’s paywalled?

4h21

Ethan@torchcompiled

The model is trained and validated on in-house text datasets of human text pre-llm era against "lab-grown-AI" text

5h845

kalim@akhmxt

@torchcompiled Should make the piece free / remove paywall for wider distribution

4h411

Ethan@torchcompiled

The Taylor Lorenz Case

5h76

Ethan@torchcompiled

Failure rates are conditional on the genre of text, and then a big one: the reported failure rate is a population average reflecting a heterogenous and imbalanced population. Some writers are paying the cost of the worst-case scenario, while others less, FPR is just an average.

5h413

Ethan@torchcompiled

There's research papers citing that human spoken language patterns takes on Ai characteristics, after filtering for scripted works. Naturally we mimic the culture we're exposed to and adapt our language The training and validation on pre-llm human text doesn't account for that

5h363

Ethan@torchcompiled

The classifier output is an inference, given we see text with XYZ patterns, what is the probability that it came from an LLM vs a human?

5h273

Ethan@torchcompiled

Model updates preserve similar or better false positive rates in average, but don't reveal how individual decisions change. There's a risk something scans as AI on monday but gets flagged on human on friday, and this can be the difference between a case and a nothingburger.

5h272

Ethan@torchcompiled

The evidence tab suffers from confirmation bias and the multiple hypothesis bias (when testing many things one is likely to come back true)

5h232

Ethan@torchcompiled

The studies of the metric on external datasets, APT, Grammarly, and BEEMO show that a mixed text can basically end up anywhere on the scale of AI to human. So the person who did light AI polish/editor work can easily be flagged human or fully AI

5h201

Ethan@torchcompiled

for mixed authorship/AI-assistance detection, most benchmarks not only use their own crafted datasets but they also create the labels, because there is really no ground truth for how much "AI-ness" a text has. There is high variance and disagreement with human evals here.

5h201

Ethan@torchcompiled

The majority of benchmarks are over internal datasets, the validation set which matches the qualities of the train set, basically a better suggestion of "did we avoid memorizing" than does this extrapolate to in-the-wild usage. External audits often follow same pattern

5h191

Ethan@torchcompiled

Ironically a Pangram blog incidentally reinforces this idea without saying it

5h181

Ethan@torchcompiled

A paper by Garland, reminds that this kind of classification, population averages of FPR over a whole validation set don't recognize that some cases are more challenging than others, and some folks are worse off than others for false positives

5h171

Ethan@torchcompiled

@akhmxt Doesn’t look like I even have paid subscriptions enabled weird

4h161

Ethan@torchcompiled

The papers on human speech resembling AI patterns

5h19

Ethan@torchcompiled

Many less official cases won't and can't be investigated but still do damage, and we're stuck with no falsifiability

5h18

Ethan@torchcompiled

A library of cases around classifier failure and admitted shortcomings

5h17