NVIDIA's Nemotron-Labs Diffusion Models Generate Multiple Tokens in Parallel, Bypassing Autoregressive Bottleneck
NVIDIA releases diffusion language models at 3B, 8B, and 14B scales that generate and refine tokens in parallel, offering latency improvements for GPU-constrained inference workloads.