Cost to Process 1 Million Tokens

Exact pricing for 1M tokens across 53 LLM APIs — equivalent to roughly 750,000 words or a 2,500-page book. Sorted cheapest first.

Cheapest model

Qwen3.5 Flash

$0.01/1M input tokens

Most expensive

Claude 3 Opus

$15.00/1M input tokens

Max savings

1500x

by choosing the cheapest model

Model	Provider	Input (1M)	Output (1M)	Total
Qwen3.5 Flash	qwen	$0.01	$0.05	$0.06
Llama 3.1 8B	meta	$0.02	$0.05	$0.07
Qwen3 235B	qwen	$0.06	$0.06	$0.12
Qwen3 8B	qwen	$0.05	$0.10	$0.15
Qwen3 30B	qwen	$0.10	$0.15	$0.25
Llama 4 Scout	meta	$0.17	$0.17	$0.34
Gemini 2.0 Flash-Lite	google	$0.07	$0.30	$0.38
Mistral Small 3.1	mistral	$0.10	$0.30	$0.40
Mistral Nemo	mistral	$0.10	$0.30	$0.40
GPT-4.1 Nano	openai	$0.10	$0.40	$0.50
Gemini 2.5 Flash-Lite	google	$0.10	$0.40	$0.50
Gemini 2.0 Flash	google	$0.10	$0.40	$0.50
Gemini 3 Flash-Lite	google	$0.12	$0.48	$0.60
Llama 3.3 70B	meta	$0.23	$0.40	$0.63
GPT-4o Mini	openai	$0.15	$0.60	$0.75
Grok 3 Mini	xai	$0.30	$0.50	$0.80
Codestral	mistral	$0.30	$0.90	$1.20
DeepSeek Chat	deepseek	$0.27	$1.10	$1.37
Llama 4 Maverick	meta	$0.50	$1.10	$1.60
GPT-4.1 Mini	openai	$0.40	$1.60	$2.00
o3	openai	$0.40	$1.60	$2.00
GPT-3.5 Turbo	openai	$0.50	$1.50	$2.00
Mistral Large 3	mistral	$0.50	$1.50	$2.00
Mistral Medium 3	mistral	$0.40	$2.00	$2.40
Gemini 3 Flash	google	$0.50	$2.00	$2.50
DeepSeek R1	deepseek	$0.55	$2.19	$2.74
Gemini 2.5 Flash	google	$0.30	$2.50	$2.80
GPT-5 Mini	openai	$0.60	$2.40	$3.00
DeepSeek R2	deepseek	$0.80	$3.20	$4.00
Claude 3.5 Haiku	anthropic	$0.80	$4.00	$4.80
o4-mini	openai	$1.10	$4.40	$5.50
Claude Haiku 4.5	anthropic	$1.00	$5.00	$6.00
Gemini 1.5 Pro	google	$1.25	$5.00	$6.25
Magistral Medium	mistral	$2.00	$5.00	$7.00
Llama 3.1 405B	meta	$3.50	$3.50	$7.00
GPT-4.1	openai	$2.00	$8.00	$10.00
Gemini 2.5 Pro	google	$1.25	$10.00	$11.25
GPT-4o	openai	$2.50	$10.00	$12.50
Gemini 3 Pro	google	$3.50	$14.00	$17.50
Claude Sonnet 4.6	anthropic	$3.00	$15.00	$18.00
Claude 3.5 Sonnet	anthropic	$3.00	$15.00	$18.00
Grok 3	xai	$3.00	$15.00	$18.00
Claude Opus 4.8	anthropic	$5.00	$25.00	$30.00
Claude Opus 4.7	anthropic	$5.00	$25.00	$30.00
Claude Opus 4.6	anthropic	$5.00	$25.00	$30.00
Grok 3 Fast	xai	$5.00	$25.00	$30.00
GPT-5	openai	$8.00	$32.00	$40.00
GPT-4 Turbo	openai	$10.00	$30.00	$40.00
Gemini 3 Ultra	google	$10.00	$30.00	$40.00
Claude Fable 5	anthropic	$10.00	$50.00	$60.00
o1	openai	$15.00	$60.00	$75.00
Claude Opus 4.5	anthropic	$15.00	$75.00	$90.00
Claude 3 Opus	anthropic	$15.00	$75.00	$90.00

* Total = input + output cost per 1M tokens. Also see: 100K tokens cost → | Cheapest LLM API →

What 1 Million Tokens Means in Practice

One million tokens is a concrete benchmark for understanding LLM API costs at scale. In English text, 1 million tokens is equivalent to roughly 750,000 words — about the length of a 2,500-page book, 10 full-length novels, or 500 detailed technical articles. For a typical chatbot conversation with 1,000 tokens per turn, 1 million tokens represents 1,000 conversations. For a document processing pipeline with 5,000-token documents, it is 200 documents processed.

Monthly usage milestones. A small SaaS product with 100 daily active users, each sending 5 messages of 200 tokens and receiving 300-token responses, processes about 2.5M tokens per day — 75M tokens per month. At GPT-4o Mini pricing ($0.15/M input, $0.60/M output), that workload costs approximately $45/month for input and $13.50/month for output — about $58.50 total. The same workload on GPT-4o ($2.50/M input, $10/M output) would cost $975/month. The model choice is a 17× cost difference.

Input vs output token distribution matters enormously. The “total” column in the table above assumes equal input and output tokens (500K input + 500K output per 1M total). But real workloads vary widely. A summarization pipeline might be 80% input, 20% output — heavily input-weighted. A creative writing assistant might be 30% input, 70% output — heavily output-weighted. Because output tokens cost 3–5× more, an output-heavy workload changes the ranking significantly. DeepSeek V3 at $0.27/M output may beat Gemini Flash at $0.30/M output for such cases.

Scaling to enterprise volume. At 1 billion tokens per month — reachable by a mid-size product with thousands of daily users — the cheapest models become capable of processing an enormous amount of work for under $500/month. At this volume, the most expensive frontier models would cost $15,000–$60,000/month for the same workload. Enterprise-scale AI products almost always implement model routing: using cheap small models for routine tasks and expensive frontier models only for the hardest edge cases. The economics demand it.

Prompt caching impact at 1M token scale. For every 1 million tokens processed, if 70% is repeated context (a common ratio in RAG pipelines and agent systems with static tool definitions), the cached portion costs 50–90% less. On Claude Sonnet 4.6, processing 1M tokens with 70% cache hit rate costs $0.90 (700K cached at $0.30/M + 300K standard at $3.00/M), versus $3.00 for 1M tokens uncached. That is a 70% cost reduction achieved purely through caching, without changing models.