Cost to Process 1 Million Tokens
Exact pricing for 1M tokens across 53 LLM APIs — equivalent to roughly 750,000 words or a 2,500-page book. Sorted cheapest first.
* Total = input + output cost per 1M tokens. Also see: 100K tokens cost → | Cheapest LLM API →
What 1 Million Tokens Means in Practice
One million tokens is a concrete benchmark for understanding LLM API costs at scale. In English text, 1 million tokens is equivalent to roughly 750,000 words — about the length of a 2,500-page book, 10 full-length novels, or 500 detailed technical articles. For a typical chatbot conversation with 1,000 tokens per turn, 1 million tokens represents 1,000 conversations. For a document processing pipeline with 5,000-token documents, it is 200 documents processed.
Monthly usage milestones. A small SaaS product with 100 daily active users, each sending 5 messages of 200 tokens and receiving 300-token responses, processes about 2.5M tokens per day — 75M tokens per month. At GPT-4o Mini pricing ($0.15/M input, $0.60/M output), that workload costs approximately $45/month for input and $13.50/month for output — about $58.50 total. The same workload on GPT-4o ($2.50/M input, $10/M output) would cost $975/month. The model choice is a 17× cost difference.
Input vs output token distribution matters enormously. The “total” column in the table above assumes equal input and output tokens (500K input + 500K output per 1M total). But real workloads vary widely. A summarization pipeline might be 80% input, 20% output — heavily input-weighted. A creative writing assistant might be 30% input, 70% output — heavily output-weighted. Because output tokens cost 3–5× more, an output-heavy workload changes the ranking significantly. DeepSeek V3 at $0.27/M output may beat Gemini Flash at $0.30/M output for such cases.
Scaling to enterprise volume. At 1 billion tokens per month — reachable by a mid-size product with thousands of daily users — the cheapest models become capable of processing an enormous amount of work for under $500/month. At this volume, the most expensive frontier models would cost $15,000–$60,000/month for the same workload. Enterprise-scale AI products almost always implement model routing: using cheap small models for routine tasks and expensive frontier models only for the hardest edge cases. The economics demand it.
Prompt caching impact at 1M token scale. For every 1 million tokens processed, if 70% is repeated context (a common ratio in RAG pipelines and agent systems with static tool definitions), the cached portion costs 50–90% less. On Claude Sonnet 4.6, processing 1M tokens with 70% cache hit rate costs $0.90 (700K cached at $0.30/M + 300K standard at $3.00/M), versus $3.00 for 1M tokens uncached. That is a 70% cost reduction achieved purely through caching, without changing models.