Hugging Face and IBM Research Launch Open Agent Leaderboard to Measure Real-World System Performance
A new benchmarking framework evaluates complete AI agent systems—not just models—across six diverse tasks, reporting both quality and cost metrics for practical deployment decisions.