#diffusion

Google DeepMind releases DiffusionGemma, a 26B diffusion model 4x faster than autoregressive generation

LLMs Jun 11, 2026

DiffusionGemma uses parallel text diffusion instead of sequential token generation, achieving 1000+ tokens/sec on H100 GPUs with trade-offs in output quality.

NVIDIA's Nemotron-Labs Diffusion Models Generate Multiple Tokens in Parallel, Bypassing Autoregressive Bottleneck

LLMs May 24, 2026

NVIDIA releases diffusion language models at 3B, 8B, and 14B scales that generate and refine tokens in parallel, offering latency improvements for GPU-constrained inference workloads.