/Tech8h ago

Thomas Serre Delivers Keynote on Scaling Laws vs Neural Laws at CVPR

11364143.8K

#1528

Original post

Ross Wightman#1528

Jack Langerman ✈️ CVPR@jacklangerman

Last keynote: "Scaling Laws vs. Neural Laws: Toward More Natural Artificial Vision" kicking off now with @tserre

10:39 AM · Jun 7, 2026 · 625 Views

/Tech8h ago

Thomas Serre Delivers Keynote on Scaling Laws vs Neural Laws at CVPR

11364143.8K

#1528

Original post

Ross Wightman#1528

Jack Langerman ✈️ CVPR@jacklangerman

Last keynote: "Scaling Laws vs. Neural Laws: Toward More Natural Artificial Vision" kicking off now with @tserre

10:39 AM · Jun 7, 2026 · 625 Views

Sentiment

Some users express enthusiasm for throwbacks to historical computer vision papers on hierarchical vision models in Thomas Serre's CVPR keynote on scaling laws versus neural laws.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Jack Langerman ✈️ CVPR@jacklangerman

starting with the tl;dr: as we in computer vision are both benchmaxxing and (successfully) improving real performance, we are making our vision systems "more artificial" (less human like).

this talk will advocate ways to make our artificial vision systems less artificial

7h1101

LIKES2

Jack Langerman ✈️ CVPR@jacklangerman

analysis been repeated in 2026 using many models from timm (h/t @wightmanr).

neural alignment is computed between primates and the models.

we see that CNNs (red dots) and especially w/ adversarial training (yellow dots) show positive correlation between perf and alignment

7h182

REPLIES1

Jack Langerman ✈️ CVPR@jacklangerman

so if these recurrent models are so good, why haven't we heard of them?

well, training and scaling RNNs is (comparatively) hard.

at first, doing some network surgery and adding recurrent layers into existing ViT models and fine tuning a bit

7h16

Jack Langerman ✈️ CVPR@jacklangerman

start with a demo: rapidly flashing images with either animal or non-animal images very quickly and showing that we can all identify them

7h79

Jack Langerman ✈️ CVPR@jacklangerman

visual importance data also collected from humans by crowd sourcing data from volunteers.

we see (via gradient based saliencey) that modern models attend quite differently than humans (more diffuse, less interpretable).

human <-> model results mirror primate results closely

7h181

Andrei Bursuc @CVPR@abursuc

Quick demo to assess the gist capabilities of recognizing animal vs non-animal from a few ms of image display. Who remembers the GIST descriptors from the 2000s? #cvpr2026

8h54

Jack Langerman ✈️ CVPR@jacklangerman

Enter State Space Models (gated delta net) which allow parallelizing over the sequence and preserving recurrence.

SSM / GDN yields a new pareto frontier for these models

7h46

Andrei Bursuc @CVPR@abursuc

Loving this throwbacks of papers from the past of computer vision: hierarchical vision models #cvpr2026

8h44

Jack Langerman ✈️ CVPR@jacklangerman

one might think this only applies to CNNs (because architecturally it is true that CNNs grow receptive field with depth), but even models with global attention at each layer (ViT) empirically fail at the pathfinder task

7h35

Jack Langerman ✈️ CVPR@jacklangerman

interestingly - it became clear that abandoning biological inspiration not only improved absolute performance, but also improve correlation with observed activations in primate visual systems (at least at first).

this eval was performed in 2018

7h23

Jack Langerman ✈️ CVPR@jacklangerman

of course AlexNet changed everything. Not only in CV, but also in neuro-bio for vision

7h22

Jack Langerman ✈️ CVPR@jacklangerman

some history: pre-AlexNet there were some biologically inspired hierarchical vision models

7h22

Jack Langerman ✈️ CVPR@jacklangerman

learning positive = "2 dots on the same contour" and negative means "dots on different contours" will be easier with RNNs as contour length grows compared to feed forward networks because feed forward networks grow receptive field w/ depth.

7h21

Jack Langerman ✈️ CVPR@jacklangerman

so is the image on the right positive or negative?

7h20

Jack Langerman ✈️ CVPR@jacklangerman

Visual illusions (resulting from internal feedback / recurrent / "horizontal" connections in the brain)

seems like a bug: maybe actually a feature?

7h20

Jack Langerman ✈️ CVPR@jacklangerman

2nd part of talk:

Architectural capacity vs. recurrent dynamics

assertion: we don't have deep multistage pipeline with distinct layers in our brains, we have a small fixed number of stages, but neurons / stages can communicate locally and propagate signals in a reentrant way

7h20

Jack Langerman ✈️ CVPR@jacklangerman

skipping a few results - will add back later (want to catch up to live)

but tl;dr - human salience is stable under viewpoint changes, not so for models. adding certain proxy tasks (eg next frame prediction) increase model salience stability.

7h18

Jack Langerman ✈️ CVPR@jacklangerman

however, explicitly supervising ViT gradients succeeds in aligning importance between models and humans (left most column)

7h18

Jack Langerman ✈️ CVPR@jacklangerman

however, as we move closer to SOTA, web scale data, ViT, etc., the trend reverses and we see fairly strong negative correlation between ImageNet multi-label accuracy and Neural alignment

7h18

Andrei Bursuc @CVPR@abursuc

The AlexNet moment was not pivotal only to computer vision community, but also to neuroscience researchers. Better AI initially meant better neuroscience models #cvpr2026

8h16