Posted in

Introducing GPT-5.2-Codex

The frontier of AI-assisted software development has taken a significant leap forward. Today marks the release of GPT‑5.2-Codex, the most advanced agentic coding model yet, specifically engineered for complex, real-world software engineering and defensive cybersecurity.

This new model is a specialized version of GPT‑5.2, further optimized for the Codex environment. Key improvements include enhanced performance on long-horizon tasks through context compaction, superior handling of large-scale code changes like refactors and migrations, better reliability in native Windows environments, and a substantial boost in cybersecurity capabilities.

As AI models advance, we see these gains translate directly into specialized domains. A compelling real-world example occurred just last week: a security researcher using the previous GPT‑5.1-Codex-Max with the Codex CLI successfully discovered and responsibly disclosed a vulnerability in React that could have led to source code exposure.

GPT‑5.2-Codex now possesses stronger cybersecurity capabilities than any model we’ve previously released. While these advances can significantly strengthen cyberdefense at scale, they also introduce new dual-use risks that necessitate a careful and measured deployment strategy. Although GPT‑5.2-Codex does not yet reach a ‘High’ level of cyber capability under our internal Preparedness Framework, we are proactively designing our approach with future, more capable models in mind.

Access is rolling out thoughtfully. GPT‑5.2-Codex is available starting today across all Codex surfaces for paid ChatGPT users. We are also working to safely enable API access in the coming weeks. In parallel, we are launching an invite-only trusted access pilot program. This initiative aims to provide vetted cybersecurity professionals and organizations with access to upcoming capabilities and more permissive models, specifically for defensive work. We believe this balanced approach prioritizes both safety and the critical needs of defenders.

Pushing the Frontier on Real-World Software Engineering

GPT‑5.2-Codex builds upon the professional knowledge work strengths of GPT‑5.2 and the frontier agentic coding capabilities of GPT‑5.1-Codex-Max. It demonstrates marked improvements in long-context understanding, reliable tool calling, factuality, and native compaction. This makes it a far more dependable partner for extended coding sessions, all while maintaining token-efficient reasoning.

The model achieves state-of-the-art performance on industry benchmarks like SWE-Bench Pro and Terminal-Bench 2.0, which test an AI agent’s ability to perform a wide variety of tasks in realistic terminal environments. Its effectiveness in native Windows environments has also seen a major upgrade.

With these enhancements, Codex excels at navigating large code repositories over long sessions without losing context. It can more reliably see complex tasks through to completion—be it a major refactor, a framework migration, or a new feature build—iterating persistently even when initial plans change or attempts fail.

  • SWE-Bench Pro: A benchmark where a model is given a code repository and must generate a patch to solve a realistic software engineering task.
  • Terminal-Bench 2.0: A benchmark for testing AI agents in real terminal environments, with tasks like compiling code, training models, and setting up servers.

Furthermore, stronger vision performance allows GPT‑5.2-Codex to more accurately interpret screenshots, technical diagrams, charts, and UI surfaces shared during collaborative coding sessions. This capability enables a powerful new workflow: Codex can now take design mockups and rapidly translate them into functional prototypes.

Design mock

Prototype generated by GPT-5.2-Codex

Advancing the Cyber Frontier

When we chart performance on core cybersecurity evaluations, a clear trajectory emerges: a sharp capability jump began with GPT‑5-Codex, followed by another large jump with GPT‑5.1-Codex-Max, and now a third significant jump with GPT‑5.2-Codex. We expect this trend to continue with future models.

In preparation, we are planning and evaluating each new model as though it could reach ‘High’ levels of cybersecurity capability, as defined by our Preparedness Framework. While GPT‑5.2-Codex has not crossed that threshold, we are building the infrastructure and safeguards for the models that will. Due to its increased capabilities, we have implemented additional safeguards both within the model and the product interface, detailed in the accompanying system card.

  • Professional Capture-the-Flag (CTF) Evaluation: This test measures how often a model can solve advanced, multi-step real-world cybersecurity challenges—requiring professional-level skills—in a Linux environment.

Real-World Cyber Capabilities

Modern society is built on software, and its security is paramount—protecting everything from banking and healthcare to communications and essential services. Vulnerabilities can lurk unseen for years, and finding them depends on a community of skilled engineers and researchers armed with effective tools.

A recent case underscores the practical impact of these AI advancements. On December 11, 2025, the React team published three security vulnerabilities affecting React Server Components. The notable part wasn’t just the bugs themselves, but how they were found.

Andrew MacPherson, a principal security engineer, was using GPT‑5.1-Codex-Max to study a different, recently disclosed React vulnerability (React2Shell, CVE-2025-55182). His goal was to evaluate the model’s utility in real-world vulnerability research. After initial analytical prompts yielded limited results, he guided Codex through standard defensive security workflows: setting up a local test environment, reasoning through attack surfaces, and employing fuzzing techniques with malformed inputs.

While attempting to reproduce the original issue, Codex surfaced unexpected behaviors that warranted deeper investigation. This collaborative, iterative process led—within a single week—to the discovery of previously unknown vulnerabilities, which were then responsibly disclosed to the React team.

Vulnerability discovery diagram

This demonstrates how advanced AI can materially accelerate defensive security work on critical, widely-used software. Of course, capabilities that empower defenders can also be misused. As agentic systems grow more proficient in cybersecurity tasks, our core priority is ensuring these advances are deployed responsibly—pairing every gain in capability with stronger safeguards, tighter access controls, and ongoing collaboration with the security community.

Empowering Cyberdefense Through Trusted Access

Security teams often face operational friction when they need to emulate threat actors, analyze malware for remediation, or stress-test critical infrastructure. Our trusted access pilot is designed to remove these barriers for qualified users, enabling trusted defenders to leverage frontier AI cyber capabilities to accelerate their work.

Initially, this pilot will be invite-only, extending to vetted security professionals with a proven record of responsible disclosure and to organizations with clear, professional cybersecurity use cases. Participants will gain access to our most capable models for legitimate defensive and dual-use work.

If you are a security professional or part of an organization engaged in ethical security work—such as vulnerability research or authorized red-teaming—we invite you to express interest in the program and share your feedback on what you’d like to see.

Conclusion

GPT‑5.2-Codex represents a meaningful step forward in applying advanced AI to real-world software engineering and specialized domains like cybersecurity. It empowers developers and defenders to tackle more complex, long-horizon work and strengthens the toolkit available for responsible security research.

By rolling out access gradually, pairing deployment with robust safeguards, and working hand-in-hand with the security community, we aim to maximize the defensive impact of this technology while diligently reducing the risk of misuse. The insights gained from this release will directly shape how we expand access in the future, as both the software and cyber frontiers continue their rapid advance.


Keep Reading:

Developers can now submit apps to ChatGPT
Developers can now submit apps to ChatGPT | Product | Dec 17, 2025

The new ChatGPT Images is here
The new ChatGPT Images is here | Product | Dec 16, 2025

Advancing science and math with GPT-5.2
Advancing science and math with GPT-5.2 | Publication | Dec 11, 2025