Google DeepMind releases DiffusionGemma, a 26B diffusion model 4x faster than autoregressive generation
DiffusionGemma uses parallel text diffusion instead of sequential token generation, achieving 1000+ tokens/sec on H100 GPUs with trade-offs in output quality.
Google DeepMind's Gemma 4 12B Brings Encoder-Free Multimodal AI to Consumer Laptops
Google DeepMind releases Gemma 4 12B, a 12-billion-parameter model with unified vision and audio processing that runs on 16GB consumer hardware.
A 3-billion-parameter economy: small models as viable multi-agent platforms
Hugging Face hackathon project demonstrates how tiny language models can power real-time simulations that frontier models cannot economically support.
JetBrains Releases Mellum2, a 12B Sparse Model for Sub-Second Inference
JetBrains' new Mixture-of-Experts model achieves 2x speedup over dense peers while activating just 2.5B parameters per token.
Liquid AI Releases 8B-A1B Mixture-of-Experts Model Trained on 38 Trillion Tokens
Liquid AI unveils a sparse 8-billion-parameter model with 1-billion active parameters, trained on 38T tokens—a scale comparable to frontier model training runs.
NVIDIA's Nemotron-Labs Diffusion Models Generate Multiple Tokens in Parallel, Bypassing Autoregressive Bottleneck
NVIDIA releases diffusion language models at 3B, 8B, and 14B scales that generate and refine tokens in parallel, offering latency improvements for GPU-constrained inference workloads.
Stability AI's Stable Audio 3.0 extends music generation to six-minute compositions
Stability AI releases four new audio models capable of generating full-length songs, with open-weights tiers and licensing deals backing the release.
OlmoEarth v1.1 cuts satellite-imagery inference costs by 3x through token optimization
Allen Institute releases OlmoEarth v1.1, a more efficient earth-observation model family that maintains v1 performance while reducing compute through shorter token sequences.