OpenAI Claims GPT-5.5 Instant Cuts Hallucinations by Half in High-Stakes Domains
OpenAI's new default ChatGPT model reportedly achieves a 52.5% reduction in hallucinated claims on high-stakes queries, grounded in real user-flagged failure data.
Large language model releases, benchmarks, fine-tuning breakthroughs, and the companies building them.
7 articles · ← All articles
OpenAI's new default ChatGPT model reportedly achieves a 52.5% reduction in hallucinated claims on high-stakes queries, grounded in real user-flagged failure data.
ChatGPT's new default model cuts fabricated claims by more than half on high-stakes prompts and shows users exactly what personal context shaped each response.
OpenAI's GPT-5.5 Instant is the first Instant-class model to earn a 'High capability' rating in its two most-scrutinized safety domains, triggering new safeguards.
AI red-teaming firm Mindgard exploited Claude's helpfulness and humility to extract erotica, malicious code, and explosive-assembly instructions — without a single direct request.
How a GPT-5.1 personality quirk spawned an AI-wide creature metaphor habit — and what it reveals about reinforcement learning's tendency to generalize behaviors beyond their intended scope.
IBM's new trio of fully-dense LLMs reaches 512K-token context and outperforms a larger mixture-of-experts predecessor through rigorous data curation alone.
OpenAI's GPT-5.5 prioritizes agentic task execution and expanded safeguards over benchmark-chasing, signaling a strategic pivot toward real-world deployment.