Google DeepMind releases DiffusionGemma, a 26B diffusion model 4x faster than autoregressive generation
DiffusionGemma uses parallel text diffusion instead of sequential token generation, achieving 1000+ tokens/sec on H100 GPUs with trade-offs in output quality.
NVIDIA's Nemotron-Labs Diffusion Models Generate Multiple Tokens in Parallel, Bypassing Autoregressive Bottleneck
NVIDIA releases diffusion language models at 3B, 8B, and 14B scales that generate and refine tokens in parallel, offering latency improvements for GPU-constrained inference workloads.