OpenAI API Pricing

Updated May 2026 · 12 models

Complete pricing for all OpenAI API models — GPT-5, GPT-5 Mini, GPT-4.1, GPT-4o, o3, and o4-mini. Calculate your monthly cost based on actual token usage.

GPT-5CACHED

OpenAI's next-generation flagship — significant capability jump over GPT-4.1 with 1M context

input /1M

$32

output /1M

GPT-5 MiniCACHED

Affordable GPT-5 intelligence — brings GPT-5 capability to cost-sensitive workloads

$0.6

input /1M

$2.4

output /1M

GPT-4.1CACHED

Latest flagship with 1M context window and strong coding/instruction following

input /1M

output /1M

GPT-4.1 Mini

Affordable intelligence with 1M context — best cost/performance in the 4.1 family

$0.4

input /1M

$1.6

output /1M

GPT-4.1 Nano

Smallest and cheapest GPT-4.1 model — ideal for simple tasks needing 1M context

$0.1

input /1M

$0.4

output /1M

o4-mini

Fast, efficient reasoning model optimized for STEM and coding tasks

$1.1

input /1M

$4.4

output /1M

Advanced reasoning model at significantly reduced price (80% cut from launch)

$0.4

input /1M

$1.6

output /1M

o1CACHED

OpenAI's original frontier reasoning model — deep thinking for the hardest problems

$15

input /1M

$60

output /1M

GPT-4oCACHED

Multimodal model with strong vision, audio, and text capabilities

$2.5

input /1M

$10

output /1M

GPT-4o Mini

Ultra-affordable model for high-volume tasks with good quality

$0.15

input /1M

$0.6

output /1M

GPT-4 Turbo

Previous generation GPT-4 Turbo — powerful but superseded by GPT-4o in cost-efficiency

$10

input /1M

$30

output /1M

GPT-3.5 Turbo

Classic fast model — still cost-effective for simple chat tasks and legacy integrations

$0.5

input /1M

$1.5

output /1M

Compare OpenAI vs Other Providers

vs Anthropic Claude vs Google Gemini vs DeepSeek GPT-4o vs Claude Sonnet GPT-4.1 vs Claude Opus

Understanding OpenAI API Pricing

OpenAI offers two distinct model families with very different pricing philosophies. The GPT series (GPT-4o, GPT-4.1, GPT-5) is optimized for instruction-following, coding, and structured outputs — pricing is relatively linear with capability. The o-series (o3, o4-mini) adds deliberate chain-of-thought reasoning before responding, making them significantly more expensive but dramatically better at mathematical reasoning, complex coding problems, and multi-step logic.

The GPT-4o vs GPT-5 trade-off. GPT-4o at $2.50/M input tokens remains the workhorse for most production applications — it balances strong capability with predictable pricing. GPT-5, OpenAI's most capable model, commands a premium for tasks where frontier reasoning matters: complex analysis, nuanced writing, advanced coding. For high-volume applications where GPT-4o quality is sufficient, GPT-4o Mini at $0.15/M input tokens provides an 16× cost reduction.

Cached input pricing. OpenAI automatically caches prompt prefixes and offers a 50% discount on repeated context — cached input tokens cost half the standard rate. This is applied automatically for eligible prompts with no code changes required. For applications with static system prompts or repeated document context, you effectively pay 50% less for those tokens on every request after the first. GPT-4o Mini with caching at ~$0.075/M cached input is among the most cost-effective options in the industry.

Batch API for 50% savings. OpenAI's Batch API processes requests asynchronously and returns results within 24 hours at exactly half the standard price. If your use case involves offline document processing, data extraction, content classification, or any non-real-time workload, batch processing is a straightforward way to cut your OpenAI bill in half with no quality trade-off. GPT-4o via Batch API costs $1.25/M input tokens — competitive with smaller models at standard pricing.

Choosing the right model for your workload. Start with GPT-4o Mini for any task that doesn't require frontier-level reasoning — customer support responses, content summarization, simple Q&A, and data extraction. Step up to GPT-4o for tasks that need stronger instruction-following, more coherent long-form output, or better code generation. Reserve o3 or GPT-5 for genuinely hard reasoning tasks where accuracy directly impacts business outcomes. This tiered approach typically reduces overall API spend by 60–80% compared to using a single premium model for everything.