/Tech1h ago

Classic GANs Fail Due to Lack of Relativity in Data Manifold Compression

521031.7K

Original post

cross entropy reduces NLL error, reducing NLL error =/= reducing sampling error, as soon as you sample from the tail (mass not explicitly punished by improved NLL), you're off-manifold i achieved the latter pic's improvement on a qwen w/o any direct SFT at all via a GAN-like loop

kalomaze@kalomaze

my intuitions around "NTP distillation is fairly weak, GAN-like macro approximation of subsequences via RL / density ratio estimation is strong, nobody has executed the latter well so far at scale" is something that i may have to flesh out into a concrete research program

10:23 PM · Jun 20, 2026 · 619 Views

Sentiment

Users are optimistic that a GAN-like loop offers a remarkably direct cure for mode collapse in Qwen sampling accuracy.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS413BOOKMARKS1LIKES5REPLIES2

kalomaze@kalomaze

classic GANs failed and were unstable for a few important reasons. most are contingent, but the most fundamental one is lacking relativity. compression of where something lies on a data manifold is degenerate in an absolute judgement form (see the rpGAN paper, they prove this)

kalomaze@kalomaze

1h41351

kalomaze@kalomaze

which means "mode collapse" as originally coined has an extremely direct cause and a remarkably direct cure. estimating macro subsequence differences + pulling an RL generator towards maximization over the difference axis at the same time, is something that can be made to work!

kalomaze@kalomaze

1h34220

kalomaze@kalomaze

RLHF is literally the open loop form of this, because bradley terry is relative/pairwise, and the reward model is frozen! a GAN-like version of RLHF applied to a fixed "chosen" data dist (with live feedback from the "rejected" RL policy samples), in principle, closes that loop

kalomaze@kalomaze

1h29520

Cosmic Raven@RavenOfSpace

@kalomaze NTP optimizes locally, sampling happens globally. Relative discrimination fixes the degeneracy. How are you handling stability when you scale this beyond qwen?

Cosmic Raven@RavenOfSpace

@kalomaze This tracks. NTP optimizes the bulk but does nothing to penalize the tail where bad samples live. Density ratio at least gives you an on-manifold signal. How's discriminator stability as you scale though?