1d ago

ZyphraAI converts ZAYA1-8B-base model to diffusion LLM

0

ZyphraAI converted its ZAYA1-8B-base autoregressive model to a diffusion LLM through mid-training rather than training from scratch. The company applied a TiDAR-based diffusion-conversion step followed by diffusion supervised fine-tuning on its existing stack. The resulting model diffuses 16-token blocks in a single step from a mask prior, matches autoregressive logits via speculative decoding, and mixes logits during sampling to deliver speedups over earlier methods such as TiDAR. Platform discussion noted diffusion language models increasingly adopting autoregressive traits on smaller sequential token blocks.

Original post

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

2:33 PM · May 14, 2026 View on X
Reposted by

@teortaxesTex >ask the performant dlm author if its seqlevel denoising or block-causal >"its a good diffusion language model, sir" >look inside >its block-causal

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Yuxi keeps winning

9:37 PM · May 14, 2026 · 5.9K Views
10:18 PM · May 14, 2026 · 704 Views

Cool.

ZyphraZyphra@ZyphraAI

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

9:33 PM · May 14, 2026 · 1.1M Views
10:26 PM · May 14, 2026 · 940 Views
ZyphraAI converts ZAYA1-8B-base model to diffusion LLM · Digg