/Tech2h ago

Conformal Prediction Improves Safe Language Model Steering with Limited Data

220096

Original post

We can perform this conformalization process with limited data as predicting whether improvement is *possible* is often easier than knowing the *precise* language sequence that will elicit the improvement. [5/n]

Gokul Swamy@g_k_swamy

However, this LFP can generalize poorly OOD. In response, we use techniques from conformal prediction to figure out when to "fall back" to the default user prompt, making sure steering is "mostly harmless." [4/n]

1:41 PM · Jun 18, 2026 · 65 Views

Sentiment

Users praise the conformal prediction research for safer language model steering as an incredibly impressive first PhD paper showcasing strong work ethic and depth.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

GOKUL.DEVVia

#1127

Posts from X

Most Activity

VIEWS31LIKES1REPLIES1

Gokul Swamy@g_k_swamy

We propose a method for doing a tractable "local search" (similar to what we explored in https://gokul.dev/sailor/) in the space of natural language, before using expert iteration to train our "language feedback policy" (LFP). [3/n]

2h3110

Gokul Swamy@g_k_swamy

Anyhow, I think this is an *incredibly* impressive first paper of the PhD for @hyunjoej, who continually impressed me with his work ethic and depth of thought. I can't recommend working with him enough :). Check out our website for more: https://hyunjoe.xyz/LanguagePolicy/.

2h241