#reinforcement-learning

Hugging Face Cuts RL Training Sync Overhead by 98% With Sparse Delta Weights

Tools May 28, 2026

A new TRL protocol reduces per-step model synchronization from terabytes to tens of megabytes by shipping only changed parameters across distributed training pipelines.

Noisy LLM Evaluators Prove Effective for Agent Training Despite Imperfection

Research May 27, 2026

Research shows that imperfect LLM-based evaluators can still meaningfully improve AI agent performance, challenging the assumption that evaluation noise is prohibitively harmful.

OpenAI's Goblin Problem Is Actually a Reinforcement Learning Problem

LLMs May 3, 2026

How a GPT-5.1 personality quirk spawned an AI-wide creature metaphor habit — and what it reveals about reinforcement learning's tendency to generalize behaviors beyond their intended scope.

IBM's Granite 4.1 Shows Data Discipline Can Beat Bigger Models

LLMs May 2, 2026

IBM's new trio of fully-dense LLMs reaches 512K-token context and outperforms a larger mixture-of-experts predecessor through rigorous data curation alone.

AlphaGo's Creator Says LLMs Are a Dead End — and Raised $1.1 Billion to Prove It

Research Apr 29, 2026

David Silver, who built AlphaGo at DeepMind, argues large language models are fundamentally capped by human data and has founded Ineffable Intelligence to pursue reinforcement learning instead.