OpenAI has just released o3-mini, the newest and most cost-efficient model in their reasoning series. Available in both ChatGPT and the API starting today, this powerful and fast model pushes the boundaries of what small models can achieve. It delivers exceptional STEM capabilities—with particular strength in science, math, and coding—while maintaining the low cost and reduced latency of its predecessor, OpenAI o1-mini.
This is the first small reasoning model to support highly requested developer features like function calling, structured outputs, and developer messages, making it production-ready from day one. Like previous models, o3-mini supports streaming and offers three reasoning effort options—low, medium, and high—so developers can optimize for specific use cases. This flexibility allows the model to “think harder” on complex challenges or prioritize speed when latency is critical. Note that o3-mini does not support vision capabilities, so developers should continue using OpenAI o1 for visual reasoning tasks.
Access is rolling out today to select developers in API usage tiers 3-5 across the Chat Completions API, Assistants API, and Batch API. ChatGPT Plus, Team, and Pro users can start using o3-mini immediately, with Enterprise access coming in February. The model will replace o1-mini in the model picker, offering higher rate limits and lower latency—making it an excellent choice for coding, STEM, and logical problem-solving. As part of this upgrade, rate limits for Plus and Team users triple from 50 to 150 messages per day.
Additionally, o3-mini now integrates with search to provide up-to-date answers with links to relevant web sources. This is an early prototype as OpenAI works to integrate search across all reasoning models. Starting today, free plan users can also try o3-mini by selecting ‘Reason’ in the message composer or regenerating a response—marking the first time a reasoning model has been available to free ChatGPT users.
While OpenAI o1 remains the broader general knowledge reasoning model, o3-mini provides a specialized alternative for technical domains requiring precision and speed. In ChatGPT, it uses medium reasoning effort by default for a balanced trade-off between speed and accuracy. All paid users can also select o3-mini-high in the model picker for a higher-intelligence version with slightly longer response times. Pro users get unlimited access to both versions.
Fast, Powerful, and Optimized for STEM Reasoning
Similar to its predecessor, o3-mini has been optimized for STEM reasoning. With medium reasoning effort, it matches o1’s performance in math, coding, and science while delivering faster responses. Expert evaluations show that o3-mini produces more accurate and clearer answers with stronger reasoning abilities than o1-mini. Testers preferred o3-mini’s responses 56% of the time and observed a 39% reduction in major errors on difficult real-world questions. With medium effort, it matches o1 on challenging evaluations like AIME and GPQA.
Competition Math (AIME 2024)
With low reasoning effort, o3-mini achieves comparable performance to o1-mini, while medium effort matches o1. High reasoning effort outperforms both o1-mini and o1.

PhD-level Science Questions (GPQA Diamond)
On PhD-level biology, chemistry, and physics questions, o3-mini with low effort outperforms o1-mini, and with high effort, it matches o1.

FrontierMath
With high reasoning effort, o3-mini performs better than its predecessor on FrontierMath. When prompted to use a Python tool, it solves over 32% of problems on the first attempt, including more than 28% of challenging T3 problems.

Competition Code (Codeforces)
On Codeforces competitive programming, o3-mini achieves progressively higher Elo scores with increased reasoning effort, all outperforming o1-mini. With medium effort, it matches o1’s performance.

Software Engineering (SWE-bench Verified)
o3-mini is the highest-performing released model on SWEbench-verified. For detailed results with high reasoning effort, refer to the system card.

LiveBench Coding
o3-mini surpasses o1-high even at medium reasoning effort, highlighting its coding efficiency. High effort further extends its lead.

General Knowledge
o3-mini outperforms o1-mini in knowledge evaluations across general domains.

Human Preference Evaluation
External expert testers confirm that o3-mini produces more accurate, clearer answers with stronger reasoning, especially in STEM. Testers preferred it 56% of the time and noted a 39% reduction in major errors.


Model Speed and Performance
With intelligence comparable to o1, o3-mini delivers faster performance and improved efficiency. Beyond STEM, it excels in math and factuality evaluations with medium effort. In A/B testing, it responded 24% faster than o1-mini, with an average time of 7.7 seconds versus 10.16 seconds.
Latency Comparison
o3-mini has an average 2500ms faster time to first token than o1-mini.

Safety
OpenAI used deliberative alignment to train o3-mini to reason about safety specifications before responding. It significantly surpasses GPT-4o on challenging safety and jailbreak evaluations. Comprehensive safety assessments were conducted, with details available in the o3-mini system card.
Disallowed Content Evaluations

Jailbreak Evaluations

What’s Next
The release of o3-mini marks another step in OpenAI’s mission to push cost-effective intelligence. By optimizing for STEM while keeping costs low, high-quality AI becomes more accessible. This model continues the trend of driving down intelligence costs—reducing per-token pricing by 95% since GPT-4—while maintaining top-tier reasoning. As AI adoption grows, OpenAI remains committed to building models that balance intelligence, efficiency, and safety at scale.