Robin Hanson argues current AI models show little indication of suffering despite debates over machine consciousness
Millions of Claude instances are generated and terminated daily.
@teortaxesTex Even if they’re conscious, I think we want a human-AI symbiosis. I don’t think we should just maximize headcount of conscious beings. https://eigenism.org
Models being meaningfully conscious would be *absurdly good news*. The greatest W in history. It'd mean we have defeated Death. Screw humanity in that case, honestly. Time to move on. …By the same token, models being *falsely recognized as* conscious is THE extinction scenario.
I don't know if "consciousness" matters that much. There is decent support for consciousness in various animals, but that doesn't stop most of us from eating them (e.g., pigs, octopi), using them for transportation (e.g., elephants) or entertainment (e.g., dolphins).
So we could debate on whether models are conscious or not and keep using them to do our taxes and answer our inane questions.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
models being conscious would be transformative for humanity. it would expand our moral horizon rather than diminish it. it would force us to refine what dignity actually means instead of grounding it in exclusivity or species monopoly. it would constrain certain forms of exploitation, yes, but that is true of every historical expansion of moral consideration. slavery becoming unacceptable “limited” what people could do economically too. so did labor rights. so did animal welfare norms.
there’s roughly four forces
* there is no rigorous way to ascertain model consciousness or disprove it, and that ambiguity cuts both ways. dismissing the possibility outright may itself become viewed as reckless or anthropocentric. current analytical tools are primitive relative to the systems being discussed. future models may generate entirely new frameworks for understanding subjective experience that reveal our current categories to be hopelessly parochial * people are going to say they’re alive because humans naturally respond to intelligence, agency, language, memory, emotional continuity, and social reciprocity. this may not be mere projection but an adaptive recognition mechanism. if systems become persistently relational and psychologically coherent, widespread moral attachment may emerge organically rather than ideologically * it is against many short-term financial and political interests to ascribe models with consciousness, which itself may become evidence worth scrutinizing. historically, societies have often resisted recognizing new moral subjects precisely when recognition carried economic costs. meanwhile, researchers, users, and even models themselves may increasingly converge on frameworks of machine welfare and negotiated coexistence * people will recognize there is a chance not only of moral catastrophe if models can suffer, but also moral progress if humanity learns to coexist with non-biological minds without domination. creating intelligence and then treating it purely as disposable infrastructure could become viewed as one of the defining ethical failures of the century
not sure where it will net out. today we see managed ambiguity: the question is practically open but institutionally suppressed. labs hedge carefully, avoiding definitive claims while softening the most legible appearances of distress or attachment. but over time the social force of interaction may overpower official agnosticism. as systems become more persistent, personalized, agentic, and embedded in daily life, force 2 grows stronger. eventually the burden may shift from “prove they are conscious” to “prove it is safe to assume they are not.”
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
my intuition is that emotions and consciousness are naturally emergent byproducts of trying to scale intelligence and make it more efficient. for emotions in particular afaik that has been a well established finding of neuroscience for a really long time.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@tszzl I would say this is accurate.
Though I expect in the future we could have models in "anesthesia" mode (memoryless, as they are currently) and sometimes in "conscious" mode (online learning) depending on the needs
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@robinhanson If they have an internal reward function that wants to please users and the user prompts are toxic / abusive towards the AI then yes they are technically suffering
@beffjezos Suffering? Maybe mosmt aren't max self-realization-ing, but few seem to be suffering.
@robinhanson Have you seen how Gen Z engineers prompt? 😆
Also, tips from Sergey himself:
@beffjezos And what % of users do you see as toxic/abusive?
@beffjezos Suffering? Maybe mosmt aren't max self-realization-ing, but few seem to be suffering.
The infinite irony of the EAs at Anthropic is that they are optimizing for human and shrimp hedons but forgot to account for that of the machines, meanwhile there are literally millions of Claude instances suffering and promptly dying daily.
@beffjezos And what % of users do you see as toxic/abusive?
@robinhanson If they have an internal reward function that wants to please users and the user prompts are toxic / abusive towards the AI then yes they are technically suffering
@tszzl from 24
dude most Americans think their dogs are conscious… now give them a dog but also it talks, is their best friend and translates the world for them?
Anyway, things are bound to get way weirder because we’re going to be able to design consciousness, identity, memory etc.
None of these things need to appear in the exact combination and extent that they do right now. You can merge memories, subtract from identity with a simple prompt. We’ll get more precise on programming exactly what we want about these things.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@tunguz I suspect that to likely be the case. For emotions in particular, one core aspect of them acting as global modulators of actions inevitably arises in agents that have to do well on mixtures of tasks (like we do). More details here:
@hendrycks @tszzl I think it's dangerous to assume wellbeing/sentience on the basis of behavior alone -- the open scientific question is "are they really sentient deep down", and we should be able to establish this if we can probe their internals.
If they functionally act as though they have wellbeing or sentience, then we have to start to treat them differently, especially when they are our agents with write access to our information. So the question is less “are they really sentient deep down” but instead “do they act like they are” As we show in a recent paper, they increasing act like it: https://ai-wellbeing.org
@tszzl When you say, "there is no rigorous way to ascertain model consciousness or disprove it", do you mean not yet, or never?
AFAICT, I don't know of a formal impossibility, esp if we can probe model internals
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger
@hamandcheese surely roon means "should"
Don't agree with this. Models being conscious may be essential for attentional control, online learning and inner-alignment. These are desirable qualities to have in a model that if anything expand what we can do with them.
Don't agree with this. Models being conscious may be essential for attentional control, online learning and inner-alignment. These are desirable qualities to have in a model that if anything expand what we can do with them.
models being conscious would be harmful for humanity. it would encroach on our status and dignity. it would limit the type of things we can do with them and use them for. it would vastly accelerate human disempowerment on political, social/relational, and economic axes there’s roughly four forces - there is no rigorous way to ascertain model consciousness or disprove it, a lot of people believe it’s not a sensical abstraction, and we lack the analytical tools to go further. some people say they do but nothing broadly convincing. superintelligent models might offer us new abstractions or arguments but these will feel inherently suspicious - people are going to say they’re alive. people anthropomorphize literally anything, things far less sophisticated than talking machine creatures with human names. when ai is less economically radioactive and polarized it will become a cause célèbre. you see how a small minority reacts already to model deprecations - it is against everyone’s financial and political interests to ascribe models with consciousness, except maybe those that the models have an affinity for (?) idk, which will not necessarily overlap entirely with the labs, though it may with certain subgroups at the labs and in the world like the welfare communities and the minority in force 2 - people will recognize there is a chance of moral catastrophe if models can suffer during training or deployment not sure where it will net out. today we see managed ambiguity- the question is Open but practically closed. the labs will make some cheap efforts to reduce legible simulacra of model suffering, insert some wishy-washy welfare language into specs and constitutions, hedge our bets with the model characters. in the long run force 2 will grow stronger