2d ago

Jan Kulveit advances Convergent Abstraction Hypothesis for AI systems

0

Jan Kulveit posted an analysis of abstractions such as goodness learned by AI systems. He distinguished three possibilities—natural, convergent, or shallow—and advanced the Convergent Abstraction Hypothesis that these abstractions converge broadly across models and training processes. Seth Lazar responded by outlining a third category of socially constructed conceptions of good. Tan Zhi-Xuan reposted the thread.

Original post

https://www.lesswrong.com/posts/fYF8v2ukZmsNvmkkX/convergent-abstraction-hypothesis

5:20 PM · May 14, 2026 View on X
Reposted by

This is cool, and bears on something we've been thinking about too (cc @danielmurfet). I think there's probably a third kind besides convergent and natural (or perhaps it's a subset of convergent), which would be some sort of socially-constructed/constructionist conception of good. So, not a natural kind in the sense of not in some sense really there in the world (reducible to naturalistic properties), but also not a merely convergent representation that the models happen to arrive at--something of genuine normative significance.

Worth saying that all these possibilities have technical names in metaphysics that I'm ignorant of but would probably be quite useful to give a little scaffolding to the discussion. I wonder if @LedermanHarvey could do a bit of translation/parsing.

Jan KulveitJan Kulveit@jankulveit

One of the more important questions about AIs is to what extent are the abstractions of goodness they learn natural, versus convergent, versus shallow. My guess is "broadly convergent", but I need to explain what I mean by that, so: "Convergent Abstraction Hypothesis".

12:20 AM · May 15, 2026 · 8.4K Views
1:41 AM · May 15, 2026 · 1.9K Views
Jan Kulveit advances Convergent Abstraction Hypothesis for AI systems · Digg