.@tserre starts his keynote on Scaling Laws vs. Neural laws: towards more natural artificial vision #cvpr2026
Users express enthusiasm for throwbacks to historical computer vision papers on hierarchical models highlighted during Thomas Serre's CVPR keynote contrasting scaling laws and neural laws.
Most Activity
Last keynote: "Scaling Laws vs. Neural Laws: Toward More Natural Artificial Vision" kicking off now with @tserre
Quick demo to assess the gist capabilities of recognizing animal vs non-animal from a few ms of image display. Who remembers the GIST descriptors from the 2000s? #cvpr2026
.@tserre starts his keynote on Scaling Laws vs. Neural laws: towards more natural artificial vision #cvpr2026
On learning simple long range dependencies in images, small ResNets work, but ViTs struggle #cvpr2026
One of the things that we can look at is improving the feedback mechanism in our architectures to mimic recurrent dynamics and circuit models from the human brain. Let’s make our models recurrent again? #cvpr2026
Loving this throwbacks of papers from the past of computer vision: hierarchical vision models #cvpr2026
Quick demo to assess the gist capabilities of recognizing animal vs non-animal from a few ms of image display. Who remembers the GIST descriptors from the 2000s? #cvpr2026
The AlexNet moment was not pivotal only to computer vision community, but also to neuroscience researchers. Better AI initially meant better neuroscience models #cvpr2026
Loving this throwbacks of papers from the past of computer vision: hierarchical vision models #cvpr2026
Taking a look on how babies interact with new objects and where is their gaze at #cvpr2026
On some other metrics for behavioral alignment on areas of focus in the image, they observe the same trend #cvpr2026
On some other metrics for behavioral alignment on areas of focus in the image, they observe the same trend #cvpr2026
However, in the second wave of models(big CNNs, ViTs) and the multi-labeling of ImageNet: the correlation between ImageNet accuracy and Neural Alignment does not hold. Similar story for SSL models, thouggh DINOv3 is better than JEPAs #cvpr2026

starting with the tl;dr: as we in computer vision are both benchmaxxing and (successfully) improving real performance, we are making our vision systems "more artificial" (less human like).
this talk will advocate ways to make our artificial vision systems less artificial
The same trend is observed for their proposed stability score #cvpr2026
They study on Co3D which learning objective for SSL lead to better human feature importance alignement. CNNs are in general better. On training objectives, autoregressive one boost alignment significantly #cvpr2026
One of the things that we can look at is improving the feedback mechanism in our architectures to mimic recurrent dynamics and circuit models from the human brain. Let’s make our models recurrent again? #cvpr2026
The same trend is observed for their proposed stability score #cvpr2026

analysis been repeated in 2026 using many models from timm (h/t @wightmanr).
neural alignment is computed between primates and the models.
we see that CNNs (red dots) and especially w/ adversarial training (yellow dots) show positive correlation between perf and alignment
However, in the second wave of models(big CNNs, ViTs) and the multi-labeling of ImageNet: the correlation between ImageNet accuracy and Neural Alignment does not hold. Similar story for SSL models, thouggh DINOv3 is better than JEPAs #cvpr2026
The AlexNet moment was not pivotal only to computer vision community, but also to neuroscience researchers. Better AI initially meant better neuroscience models #cvpr2026
They study on Co3D which learning objective for SSL lead to better human feature importance alignement. CNNs are in general better. On training objectives, autoregressive one boost alignment significantly #cvpr2026
Taking a look on how babies interact with new objects and where is their gaze at #cvpr2026

start with a demo: rapidly flashing images with either animal or non-animal images very quickly and showing that we can all identify them

visual importance data also collected from humans by crowd sourcing data from volunteers.
we see (via gradient based saliencey) that modern models attend quite differently than humans (more diffuse, less interpretable).
human <-> model results mirror primate results closely

Enter State Space Models (gated delta net) which allow parallelizing over the sequence and preserving recurrence.
SSM / GDN yields a new pareto frontier for these models

one might think this only applies to CNNs (because architecturally it is true that CNNs grow receptive field with depth), but even models with global attention at each layer (ViT) empirically fail at the pathfinder task

interestingly - it became clear that abandoning biological inspiration not only improved absolute performance, but also improve correlation with observed activations in primate visual systems (at least at first).
this eval was performed in 2018

of course AlexNet changed everything. Not only in CV, but also in neuro-bio for vision

some history: pre-AlexNet there were some biologically inspired hierarchical vision models