Cost to Process 1 Million Tokens

Exact pricing for 1M tokens across 53 LLM APIs — equivalent to roughly 750,000 words or a 2,500-page book. Sorted cheapest first.

Cheapest model
Qwen3.5 Flash
$0.01/1M input tokens
Most expensive
Claude 3 Opus
$15.00/1M input tokens
Max savings
1500x
by choosing the cheapest model
ModelProviderInput (1M)Output (1M)Total
Qwen3.5 Flashqwen$0.01$0.05$0.06
Llama 3.1 8Bmeta$0.02$0.05$0.07
Qwen3 235Bqwen$0.06$0.06$0.12
Qwen3 8Bqwen$0.05$0.10$0.15
Qwen3 30Bqwen$0.10$0.15$0.25
Llama 4 Scoutmeta$0.17$0.17$0.34
Gemini 2.0 Flash-Litegoogle$0.07$0.30$0.38
Mistral Small 3.1mistral$0.10$0.30$0.40
Mistral Nemomistral$0.10$0.30$0.40
GPT-4.1 Nanoopenai$0.10$0.40$0.50
Gemini 2.5 Flash-Litegoogle$0.10$0.40$0.50
Gemini 2.0 Flashgoogle$0.10$0.40$0.50
Gemini 3 Flash-Litegoogle$0.12$0.48$0.60
Llama 3.3 70Bmeta$0.23$0.40$0.63
GPT-4o Miniopenai$0.15$0.60$0.75
Grok 3 Minixai$0.30$0.50$0.80
Codestralmistral$0.30$0.90$1.20
DeepSeek Chatdeepseek$0.27$1.10$1.37
Llama 4 Maverickmeta$0.50$1.10$1.60
GPT-4.1 Miniopenai$0.40$1.60$2.00
o3openai$0.40$1.60$2.00
GPT-3.5 Turboopenai$0.50$1.50$2.00
Mistral Large 3mistral$0.50$1.50$2.00
Mistral Medium 3mistral$0.40$2.00$2.40
Gemini 3 Flashgoogle$0.50$2.00$2.50
DeepSeek R1deepseek$0.55$2.19$2.74
Gemini 2.5 Flashgoogle$0.30$2.50$2.80
GPT-5 Miniopenai$0.60$2.40$3.00
DeepSeek R2deepseek$0.80$3.20$4.00
Claude 3.5 Haikuanthropic$0.80$4.00$4.80
o4-miniopenai$1.10$4.40$5.50
Claude Haiku 4.5anthropic$1.00$5.00$6.00
Gemini 1.5 Progoogle$1.25$5.00$6.25
Magistral Mediummistral$2.00$5.00$7.00
Llama 3.1 405Bmeta$3.50$3.50$7.00
GPT-4.1openai$2.00$8.00$10.00
Gemini 2.5 Progoogle$1.25$10.00$11.25
GPT-4oopenai$2.50$10.00$12.50
Gemini 3 Progoogle$3.50$14.00$17.50
Claude Sonnet 4.6anthropic$3.00$15.00$18.00
Claude 3.5 Sonnetanthropic$3.00$15.00$18.00
Grok 3xai$3.00$15.00$18.00
Claude Opus 4.8anthropic$5.00$25.00$30.00
Claude Opus 4.7anthropic$5.00$25.00$30.00
Claude Opus 4.6anthropic$5.00$25.00$30.00
Grok 3 Fastxai$5.00$25.00$30.00
GPT-5openai$8.00$32.00$40.00
GPT-4 Turboopenai$10.00$30.00$40.00
Gemini 3 Ultragoogle$10.00$30.00$40.00
Claude Fable 5anthropic$10.00$50.00$60.00
o1openai$15.00$60.00$75.00
Claude Opus 4.5anthropic$15.00$75.00$90.00
Claude 3 Opusanthropic$15.00$75.00$90.00

* Total = input + output cost per 1M tokens. Also see: 100K tokens cost → | Cheapest LLM API →

What 1 Million Tokens Means in Practice

One million tokens is a concrete benchmark for understanding LLM API costs at scale. In English text, 1 million tokens is equivalent to roughly 750,000 words — about the length of a 2,500-page book, 10 full-length novels, or 500 detailed technical articles. For a typical chatbot conversation with 1,000 tokens per turn, 1 million tokens represents 1,000 conversations. For a document processing pipeline with 5,000-token documents, it is 200 documents processed.

Monthly usage milestones. A small SaaS product with 100 daily active users, each sending 5 messages of 200 tokens and receiving 300-token responses, processes about 2.5M tokens per day — 75M tokens per month. At GPT-4o Mini pricing ($0.15/M input, $0.60/M output), that workload costs approximately $45/month for input and $13.50/month for output — about $58.50 total. The same workload on GPT-4o ($2.50/M input, $10/M output) would cost $975/month. The model choice is a 17× cost difference.

Input vs output token distribution matters enormously. The “total” column in the table above assumes equal input and output tokens (500K input + 500K output per 1M total). But real workloads vary widely. A summarization pipeline might be 80% input, 20% output — heavily input-weighted. A creative writing assistant might be 30% input, 70% output — heavily output-weighted. Because output tokens cost 3–5× more, an output-heavy workload changes the ranking significantly. DeepSeek V3 at $0.27/M output may beat Gemini Flash at $0.30/M output for such cases.

Scaling to enterprise volume. At 1 billion tokens per month — reachable by a mid-size product with thousands of daily users — the cheapest models become capable of processing an enormous amount of work for under $500/month. At this volume, the most expensive frontier models would cost $15,000–$60,000/month for the same workload. Enterprise-scale AI products almost always implement model routing: using cheap small models for routine tasks and expensive frontier models only for the hardest edge cases. The economics demand it.

Prompt caching impact at 1M token scale. For every 1 million tokens processed, if 70% is repeated context (a common ratio in RAG pipelines and agent systems with static tool definitions), the cached portion costs 50–90% less. On Claude Sonnet 4.6, processing 1M tokens with 70% cache hit rate costs $0.90 (700K cached at $0.30/M + 300K standard at $3.00/M), versus $3.00 for 1M tokens uncached. That is a 70% cost reduction achieved purely through caching, without changing models.