Noisy LLM Evaluators Prove Effective for Agent Training Despite Imperfection
Research shows that imperfect LLM-based evaluators can still meaningfully improve AI agent performance, challenging the assumption that evaluation noise is prohibitively harmful.