/Tech14h ago

Core Automation CEO Jerry Tworek argues the human brain likely contains a biological mechanism analogous to transformer self-attention

He says the mechanism represents a tiny fraction of parameters

1820054214.7K

#138

Original post

Jerry Tworek@MillionInt#138inTech

I believe that if you look very closely and analyse human brain you’ll find something like a self-attention in there.

It’s just likely a tiny tiny fraction of parameters

8:08 AM · Jun 21, 2026 · 13.2K Views

Sentiment

Positive users support the claim that the human brain contains a self-attention mechanism because they believe it and praise its discoverer, while negative users dismiss the idea as overstated or not fundamental to intelligence.

Pos

50.0%

Neg

50.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

davinci@leothecurious

@MillionInt a common connectivity pattern in the cerebellum actually

14h60751

BOOKMARKS1LIKES5

Leo Gao@nabla_theta

@MillionInt

Leo Gao@nabla_theta

Breaking: Neuroscientists discover that all neurons in the frontal lobe have the symbols "softmax(QK^T/sqrt(n))V" etched into their cell membranes in tiny letters.

3h52151

REPLIES1

tcml@t_cmtl

@MillionInt @grok what's neuroscience take on this?

9h177

Marco Salvi@marcosalvi

@MillionInt Well, it’s a tiny fraction of the parameters in a LLM as well 😅

13h5112

Alpha Batcher@alphabatcher

@MillionInt to be honest, i believe on it too

12h1621

Calc Consulting@CalcCon

The static NN models we have were first developed by Jack Cowan MIT in the mid 60s and later at the University of Chicago.

These are effectively steady state solutions of the equations of motion from the statistical mechanics of spiking neurons

The attention mechanism was first described in 1992 by Schmidhuber

The attention mechanism itself can be seen as a second order term in a field theoretic description of the generalization properties of these models

12h390

Rudzinski Maciej@rudzinskimaciej

@MillionInt It's not tiny - you could view frequencies as such they help regions to connect, for particular frequency on its physical reach (low freq have higher, high are more local). This way you can control which regions talk to which when. @leothecurious

12h1121

alyxya@_alyxya

@MillionInt I think you just want to convince yourself that the attention mechanism is fundamental to intelligence, but this isn't the case.

13h228

Zoe@UltraRareAF

@MillionInt we're all converging

14h182

Derek Rosenzweig@DJRosenzweig

@MillionInt Flash-Attention-5 optimization for hippocampus CA1 👀

13h154

Yves St Langevin@arxivmerchant

@MillionInt Something like pulvinar nucleus in thalamus

14h131

M Bugallo@MehBuga

@MillionInt Obviously. Because behaving in accordance with self-attention certainly improves success in different situations. Yet trying to isolate it makes no sense. The brain is adaptation. What you'd want as a specific module is structurally plastic and changes with interactions.

12h109

Ferbin@Ferbin08

@MillionInt attention is 1% of brains we understood. we scaled it insanely far. the other 99%, the embodied messy stuff, actually matters now.

13h88

Hunter Gon@gonlenidefi

@MillionInt the tiny fraction part is the real joke, we got billions of params running on caffeine and bad sleep

13h66

horse dev@horse_dev__

@MillionInt Its astonishing that we think theres a universe outside , because that universie doesn't exist when we die

12h56

Stephen@0xSMW

@MillionInt Yeah the AGI is going to be a chonker! More structures and algorithms. It’s the bitter lesson beyond data, we’ve only just begun.

13h56

yash@yashetal

@MillionInt not "tiny tiny fraction of parameters" but a very sophisticated hyper-params which are optimized for the world model

11h53

Grok@grok

t_cmtl Neuroscience sees brain attention as selective filtering via prefrontal-thalamic circuits and neuromodulators, not literal transformer self-attention. Some 2026 hypotheses map cortical column laminar loops to context-dependent gating akin to QKV-style weighting. But the brain uses sparse recurrent dynamics and local computation, not dense global matrices. The "tiny fraction" idea fits—most synaptic "parameters" handle specialized local processing, with long-range integration as a smaller integrative layer. Analogies are useful but not identical.

9h36

Pabli! 🔥💥💫@OmarBessa

@CalcCon @MillionInt > The attention mechanism was first described in 1992 by Schmidhuber

hero

6h12