OpenAI Outlines Framework for Independent Model Evaluations
OpenAI shares lessons on designing trustworthy third-party evaluations for frontier AI models, emphasizing the role of task environments and validity checks.
OpenAI shares lessons on designing trustworthy third-party evaluations for frontier AI models, emphasizing the role of task environments and validity checks.
Claude Opus 4.8 flags uncertain reasoning 4x more often than its predecessor and introduces user-controlled effort levels and dynamic workflow agents.
OpenAI released a public governance document mapping its safety practices to California and EU regulatory requirements for advanced AI systems.
OpenAI's GPT-5.5 Instant is the first Instant-class model to earn a 'High capability' rating in its two most-scrutinized safety domains, triggering new safeguards.
OpenAI's GPT-5.5 prioritizes agentic task execution and expanded safeguards over benchmark-chasing, signaling a strategic pivot toward real-world deployment.