Enterprise AI Hits a Wall: Frontier Models Struggle Below 50% on Real-World IT Operations Tasks
A new benchmark reveals that even the most capable AI systems struggle with diagnosing complex infrastructure failures, scoring below 50% on Site Reliability Engineering scenarios.