OpenAI announced Monday that it’s releasing a new version of GPT-5 to its AI coding agent, Codex. The company says its new model, called GPT-5-Codex, spends its “thinking” time more dynamically than previous models, and could spend anywhere from a few seconds to seven hours on a coding task. As a result, it performs better on agentic coding benchmarks.
The new model is now rolling out in Codex products — which can be accessed via a terminal, IDE, GitHub, or ChatGPT — to all ChatGPT Plus, Pro, Business, Edu, and Enterprise users. OpenAI says it plans to make the model available to API customers in the future.
The update is part of OpenAI’s effort to make Codex more competitive with other AI coding products, such as Claude Code, Anysphere’s Cursor, or Microsoft’s GitHub Copilot. The market for AI coding tools has become much more crowded in the last year, as a result of intense user demand. Cursor surpassed $500 million in ARR earlier in 2025 and Windsurf, a similar code editor, was the subject of a chaotic acquisition attempt that saw its team split between Google and Cognition.
OpenAI says that GPT-5-Codex outperforms GPT-5 on SWE-bench Verified, a benchmark measuring agentic coding abilities, as well as a benchmark measuring performance on code refactoring tasks from large, established repositories.

The company also says it trained GPT-5-Codex for conducting code reviews, and asked experience software engineers to evaluate the model’s review comments. The engineers reportedly found GPT-5-Codex to submit fewer incorrect comments, while adding more “high-impact comments.”
In a briefing, OpenAI’s Codex product lead Alexander Embiricos said that much of the increased performance was thanks to GPT-5-Codex’s dynamic “thinking abilities.” Users may be familiar with GPT-5’s router in ChatGPT, which directs queries to different models based on the complexity of a task. Embiricos said GPT-5-Codex works similarly, but has no router under the hood, and can adjust for how long to work on a task in real-time.
Embiricos says this is an advantage compared to a router, which decides how much computational power and time to use on a problem at the outset. Instead, GPT-5-Codex can decide five minutes into a problem that it needs to spend another hour. Embiricos said he’s seen the model take upwards of seven hours in some cases.
Techcrunch event
San Francisco
|
October 27-29, 2025