#open-weights

Google DeepMind releases DiffusionGemma, a 26B diffusion model 4x faster than autoregressive generation

LLMs Jun 11, 2026

DiffusionGemma uses parallel text diffusion instead of sequential token generation, achieving 1000+ tokens/sec on H100 GPUs with trade-offs in output quality.

Google DeepMind's Gemma 4 12B Brings Encoder-Free Multimodal AI to Consumer Laptops

LLMs Jun 10, 2026

Google DeepMind releases Gemma 4 12B, a 12-billion-parameter model with unified vision and audio processing that runs on 16GB consumer hardware.

A 3-billion-parameter economy: small models as viable multi-agent platforms

Tools Jun 7, 2026

Hugging Face hackathon project demonstrates how tiny language models can power real-time simulations that frontier models cannot economically support.

JetBrains Releases Mellum2, a 12B Sparse Model for Sub-Second Inference

LLMs Jun 1, 2026

JetBrains' new Mixture-of-Experts model achieves 2x speedup over dense peers while activating just 2.5B parameters per token.

Liquid AI Releases 8B-A1B Mixture-of-Experts Model Trained on 38 Trillion Tokens

LLMs May 31, 2026

Liquid AI unveils a sparse 8-billion-parameter model with 1-billion active parameters, trained on 38T tokens—a scale comparable to frontier model training runs.

NVIDIA's Nemotron-Labs Diffusion Models Generate Multiple Tokens in Parallel, Bypassing Autoregressive Bottleneck

LLMs May 24, 2026

NVIDIA releases diffusion language models at 3B, 8B, and 14B scales that generate and refine tokens in parallel, offering latency improvements for GPU-constrained inference workloads.

Stability AI's Stable Audio 3.0 extends music generation to six-minute compositions

Tools May 22, 2026

Stability AI releases four new audio models capable of generating full-length songs, with open-weights tiers and licensing deals backing the release.

OlmoEarth v1.1 cuts satellite-imagery inference costs by 3x through token optimization

Research May 21, 2026

Allen Institute releases OlmoEarth v1.1, a more efficient earth-observation model family that maintains v1 performance while reducing compute through shorter token sequences.