AI technologist @deepfates issues a public request to identify researchers working on humanistic interpretability
Will Brown suggested Meta is active in the field.
Positive users show interest in collaborating on humanistic interpretability using tools like talkie since the concepts match their work on alignment and models, while negative users see the proposal as deeply misaligned.
Most Activity
@deepfates meta
humanistic interpretability. who's working on this

i guess two directions: working with people across humanistic disciplines to better understand data and models, and using models like talkie to make progress on questions of interest to humanists. also we have office space in sf now if you ever want to come and chat (or virtually works as well)

@deepfates talkie team very interested in this (depending on the definition)

@status_effects I'm very interested in talkie!! What do you define the interesting here

@deepfates what does this mean

@deepfates models are strange half-silvered mirrors

@deepfates @NicoleSHsing basically her startup

@deepfates @eigengenesis meme makers

@deepfates also anthropic. no way mechanistic interpretability leads to zero insights into the human mind

@deepfates Humanists?

@deepfates psychoanalysis

@deepfates that is very close to my work. A kind of humanistic Tao of interpretability, transcendent alignment, & behavioral insights. It still feels like things are siloed, & there isn’t a neighborhood with an address for us to all find one another.

@deepfates @marvin_panics

@deepfates Meta to serve you better ads

@deepfates Considered deeply misaligned if it works (as eval, gets called social credit score, as working process, gets called "thing I don't want run on me at the airport")

@deepfates We are @BuildCoherence. A prerequisite to human interpretability is having coherent and well-defined beliefs. Which is why we're developing a protocol for using using logical structure rather than prose as the substrate for communication.

@deepfates For millennia. That's what we do.

@deepfates i think mech interp has been great at finding features and circuits but less good at producing insights that change how you actually use or debug a model. humanistic would prioritize whether the explanation shifts your mental model over whether it decomposes perfectly

@deepfates Neuralink, I think.

@status_effects @deepfates I thought this was a shit post but cool that you didn't