AI Token Cost Calculator
Calculate and compare LLM API costs for any provider, model, and workload.
Latest flagship with 1M context window and strong coding/instruction following
Save up to 90% on repeated prompts
💰 Estimated Cost
Real-time estimate · Updates as you type
Common use cases
Click a preset to auto-fill the calculator above.
How pricing works
LLM APIs charge per token — roughly 4 characters of text. Pricing is split between input (prompt) and output (completion) tokens, and varies by model and provider.
cost = (
input_tokens × input_price
+ output_tokens × output_price
) / 1,000,000 × requestsFAQ
How LLM API Pricing Works
Large language models charge for usage in tokens — the basic unit of text that AI models process. A token is roughly 4 characters or ¾ of a word in English. “Hello, world!” is 4 tokens. A typical paragraph is 100–150 tokens. Understanding token pricing is the single most important skill for keeping AI API costs manageable at scale.
Every LLM API splits pricing into two categories: input tokens (the text you send — your prompt, context, and instructions) and output tokens (the text the model generates). Output tokens typically cost 3–5× more than input tokens because generation requires sequential computation, while reading your prompt can be parallelized across hardware.
Why prices vary 100× across models. A frontier reasoning model like OpenAI o3 or Claude Opus costs $15–$60 per million input tokens, while a small efficient model like DeepSeek V3 or Gemini Flash costs $0.10–0.40. The gap reflects parameter count, reasoning depth, and infrastructure costs. For most production workloads, the cheapest model that meets your quality bar is the correct default.
Estimating real monthly costs. A customer service chatbot handling 10,000 requests per day, with 1,500 input tokens and 400 output tokens per request, processes roughly 570 million tokens per month. On GPT-4o ($2.50/M input, $10/M output) that totals around $5,700/month. The same workload on GPT-4o Mini ($0.15/M input, $0.60/M output) costs $342/month — a $5,300/month difference from a single model swap.
Prompt caching cuts costs dramatically. When your system prompt or documents repeat across requests, providers like Anthropic (90% discount), Google (75%), and OpenAI (50%) offer cached token pricing. For RAG pipelines that inject the same document corpus repeatedly, caching often cuts the monthly bill in half. Claude's prompt cache writes cost $3.75/M but cached reads cost only $0.30/M — so any prompt reused 2+ times starts saving money immediately.
Batch APIs for non-real-time work. OpenAI and Anthropic both offer batch processing APIs at 50% off standard pricing for jobs that can complete within 24 hours — document analysis, data extraction, offline classification. Combined with model selection and prompt caching, a well-optimized LLM pipeline typically costs 5–20× less than a naive first implementation. Use the calculator above to model your specific workload with real numbers.