comdak's User Avatar

@comdak

in /technology 3 hours ago

Exposing biases, moods, personalities, and abstract concepts hidden in large language models

Exposing biases, moods, personalities, and abstract concepts hidden in large language models | MIT News | Massachusetts Institute of Technology - Featured Image

Exposing biases, moods, personalities, and abstract concepts hidden in large language models | MIT News | Massachusetts Institute of Technology

news.mit.edu - faviconnews.mit.edu
TLDR

A new method developed by MIT and UC San Diego researchers can identify and manipulate hidden biases, moods, personalities, and abstract concepts within large language models (LLMs). The approach uses recursive feature machines to pinpoint specific representations within LLMs and can steer these representations to enhance or diminish certain concepts in model responses. This method can improve LLM safety and performance by illuminating hidden concepts and potential vulnerabilities.

18Score: 18

0 Comments