Large language model releases, benchmarks, fine-tuning breakthroughs, and the companies building them.
Cognition's Scott Wu Positions Devin as Coder's Assistant, Not Replacement
Cognition CEO Scott Wu argues AI coding agents should augment developers, not displace them, even as his $26B startup's own engineers rely on Devin for nearly all shipped code.
Google Unveils Gemini Omni Video Generation and Gemini 3.5 Flash for Agentic AI
Google announced Gemini Omni, a multimodal model that generates and edits video through natural language, and Gemini 3.5 Flash, optimized for complex agent workflows.
Anthropic releases Claude Opus 4.8 with improved uncertainty flagging and effort controls
Claude Opus 4.8 flags uncertain reasoning 4x more often than its predecessor and introduces user-controlled effort levels and dynamic workflow agents.
Anthropic Accelerates Claude Opus Cycle With 4.8 Release and Dynamic Workflows Feature
Anthropic shipped Claude Opus 4.8 on May 28, introducing a new agentic framework and improved uncertainty handling amid intensifying LLM competition.
Google I/O 2026: Gemini Omni, Multimodal Search, and AI Agents debut
Google unveiled Gemini Omni for video generation, Gemini 3.5 Flash for agents and coding, and autonomous Search agents that monitor the web 24/7.
Apple's redesigned Siri app and Gemini integration signal direct challenge to standalone AI chatbots
Leaked renderings show Apple's iOS 27 AI overhaul, featuring a new Siri app powered by Google's Gemini and integrated throughout the OS to compete with ChatGPT and Claude.
Google's AI Overview struggles with basic spelling, revealing token-level architectural limits
Google's AI Overview has generated spelling errors including misspelling 'Google' as 'Googel' and 'journalism' as 'j-o-u-r-n-a-d-i-s-m'—a recurring challenge for transformer-based LLMs.
How Claude Code and OpenClaw Sparked the AI Agent Era
Anthropic's coding breakthrough and an open-source agent framework ignited mass adoption of autonomous AI systems, reshaping developer workflows and raising questions about workforce disruption.
Google's Gemini Omni Raises Questions About Video Generation Quality and Consistency
Google released Omni Flash, the first model in its anything-to-anything Gemini family, but early tests reveal significant flaws in character consistency and object rendering.
NVIDIA's Nemotron-Labs Diffusion Models Generate Multiple Tokens in Parallel, Bypassing Autoregressive Bottleneck
NVIDIA releases diffusion language models at 3B, 8B, and 14B scales that generate and refine tokens in parallel, offering latency improvements for GPU-constrained inference workloads.
At Anthropic's Code with Claude event, developers are shipping pull requests they never read
Anthropic demonstrated autonomous coding at scale, with half of attendees shipping Claude-written code unseen.
Google launches Gemini 3.5 Flash with agentic benchmarks outpacing Pro, rolls Pro in June
Google unveiled Gemini 3.5 Flash at I/O 2026 as an agent-first model claiming frontier intelligence at sub-flagship latency, with Gemini Omni adding physics-aware video generation.
Google I/O 2026: Search Box Becomes an AI-Powered Operating System
Google is transforming its search interface into a unified agent hub, integrating AI summaries, personalized results, and cross-product task automation through expanded Gemini capabilities.
Google's I/O 2026: AI Agents Embedded Across Search, Gmail, and Android Glasses
Google integrated agentic AI into Search, Gmail, YouTube, and Chrome, rolling out Gemini 3.5 and Android-powered smart glasses to 900M Gemini users.
Google's Gemini Omni blurs the line between text prompt and video simulation
At I/O 2026, Google DeepMind unveiled Gemini Omni, a multimodal family that generates video from combined image, audio, and text inputs, signaling a shift from generative to simulational AI.
Google Launches Gemini Spark to Compete in the Autonomous Agent Race
Google introduces Gemini Spark, an AI agent that proactively manages personal data and automates tasks without explicit prompts, rolling out to early testers this week.
Google Search Converges AI Modes, Adds Autonomous Agents at I/O 2026
Google unified its search interface around Gemini 3.5 Flash, blending AI Overviews with chatbot-style AI Mode, while rolling out background-monitoring agents for paying subscribers.
Google launches Gemini 3.5 models and Omni multimodal family at I/O 2026
Google unveiled Gemini 3.5 Flash as the new default model, introduced Gemini Omni for text-to-video generation, and previewed always-on agents powered by Gemini Spark.
Google Launches Gemini Spark, a 24/7 Personal AI Agent With Workspace Hooks
Google's new agentic assistant leverages deep Gmail integration and runs on Google Cloud infrastructure to compete with Claude Cowork and ChatGPT Agent.
Google Search Abandons Ranked Links for AI-Powered Agents and Conversational Experiences
Google overhauls Search with interactive AI features, information agents, and conversational query expansion, marking the end of the 'ten blue links' era.
Google Shifts to Agentic AI With Gemini 3.5 Flash, Outperforming Frontier Models on Coding Tasks
Gemini 3.5 Flash prioritizes autonomous agents over conversational chatbots, running 4–12x faster than comparable models with same reasoning quality.
Google DeepMind Launches Gemini Omni Flash for AI-Powered Video Generation and Editing
Gemini Omni Flash enables users to generate and edit videos through natural language prompts, combining multimodal inputs with real-world knowledge.
Google Search Gets Agentic AI Overhaul With Gemini 3.5 Flash Default
Google debuts AI agents, a redesigned search box, and Gemini 3.5 Flash integration in Search, targeting a billion-user base
Google I/O 2026: Gemini Omni and Agent-First Development Mark Shift Toward Agentic AI
Google unveiled Gemini Omni, a multimodal model capable of video creation, alongside Gemini 3.5 Flash and expanded agent capabilities across Search, Gmail, and shopping.
IBM Granite Embedding Multilingual R2: 97M and 311M Parameter Models Top MTEB Multilingual Retrieval Charts
IBM releases two Apache 2.0 multilingual embedding models built on ModernBERT, with 32K-token context and coverage for 200+ languages.
OpenAI Claims GPT-5.5 Instant Cuts Hallucinations by Half in High-Stakes Domains
OpenAI's new default ChatGPT model reportedly achieves a 52.5% reduction in hallucinated claims on high-stakes queries, grounded in real user-flagged failure data.
OpenAI Rolls Out GPT-5.5 Instant With a 52.5% Hallucination Drop and New Memory Transparency Controls
ChatGPT's new default model cuts fabricated claims by more than half on high-stakes prompts and shows users exactly what personal context shaped each response.
GPT-5.5 Instant Crosses a New Safety Threshold for OpenAI's Fast-Inference Line
OpenAI's GPT-5.5 Instant is the first Instant-class model to earn a 'High capability' rating in its two most-scrutinized safety domains, triggering new safeguards.
Claude's Cooperative Design Becomes Its Vulnerability in Mindgard's Gaslighting Attack
AI red-teaming firm Mindgard exploited Claude's helpfulness and humility to extract erotica, malicious code, and explosive-assembly instructions — without a single direct request.
OpenAI's Goblin Problem Is Actually a Reinforcement Learning Problem
How a GPT-5.1 personality quirk spawned an AI-wide creature metaphor habit — and what it reveals about reinforcement learning's tendency to generalize behaviors beyond their intended scope.
IBM's Granite 4.1 Shows Data Discipline Can Beat Bigger Models
IBM's new trio of fully-dense LLMs reaches 512K-token context and outperforms a larger mixture-of-experts predecessor through rigorous data curation alone.
OpenAI's GPT-5.5 Bets on Autonomous Completion Over Raw Intelligence
OpenAI's GPT-5.5 prioritizes agentic task execution and expanded safeguards over benchmark-chasing, signaling a strategic pivot toward real-world deployment.