OpenAI's GPT-5.2 Tops Independent Coding Benchmarks, Closes Gap on Reasoning
Third-party evaluators say GPT-5.2 now leads on SWE-bench Verified and narrows the reasoning gap with rival frontier models.
Independent benchmarking groups report that OpenAI's GPT-5.2 has taken the top spot on SWE-bench Verified, a widely used measure of real-world software engineering ability, edging out competing frontier models for the first time since GPT-5 launched.
Evaluators also note meaningful gains on multi-step reasoning tasks, though they caution that the gap with rival labs remains within the margin of normal benchmark volatility. OpenAI has not commented on the results beyond confirming GPT-5.2 is now generally available via the API.
Originally published by The Verge.
Related News
Anthropic Launches Claude Opus 4.8 With a 1 Million Token Context Window
The new flagship model extends Claude's context window eightfold and adds faster tool-calling for agentic workflows, Anthropic says.
Google Brings Gemini 3 Pro On-Device Reasoning to Pixel and Android Flagships
A system update lets Gemini 3 Pro run multi-step reasoning entirely on-device on the latest Pixel and Snapdragon flagship phones.
Meta Releases Llama 4 Open-Weight Models With Native Multimodal Support
Meta says the Llama 4 family is free for commercial use under its updated license and matches closed models on several benchmarks.
Microsoft Expands Copilot Studio With Multi-Agent Orchestration for Enterprises
The update lets enterprise IT teams chain specialized Copilot agents together with shared memory and audit logging.
Comments
Sign in to join the discussion.