Hugging Face Cuts RL Training Sync Overhead by 98% With Sparse Delta Weights
A new TRL protocol reduces per-step model synchronization from terabytes to tens of megabytes by shipping only changed parameters across distributed training pipelines.
A new TRL protocol reduces per-step model synchronization from terabytes to tens of megabytes by shipping only changed parameters across distributed training pipelines.
Research shows that imperfect LLM-based evaluators can still meaningfully improve AI agent performance, challenging the assumption that evaluation noise is prohibitively harmful.
How a GPT-5.1 personality quirk spawned an AI-wide creature metaphor habit — and what it reveals about reinforcement learning's tendency to generalize behaviors beyond their intended scope.
IBM's new trio of fully-dense LLMs reaches 512K-token context and outperforms a larger mixture-of-experts predecessor through rigorous data curation alone.
David Silver, who built AlphaGo at DeepMind, argues large language models are fundamentally capped by human data and has founded Ineffable Intelligence to pursue reinforcement learning instead.