Cheapest LLM API in 2026
Ranked by input token price — the dominant cost for most workloads. Updated May 2026.
All Models Ranked by Input Price
Cheapest Model By Use Case
Also see: Best value LLM → | 1M token costs → | Compare all models →
How to Find the Cheapest LLM for Your Actual Workload
The model with the lowest input price per million tokens is rarely the cheapest model for your specific workload. The reason: output tokens cost 3–5× more than input tokens, and the ratio varies significantly across models. A model with $0.05/M input but $0.25/M output will cost more than a model with $0.10/M input and $0.10/M output for any workload that generates substantial output. Always calculate total cost using your real input-to-output ratio.
Small models for simple tasks. The most cost-effective strategy is to route simple, well-defined tasks to the smallest capable model. Customer service responses, content classification, data extraction with a fixed schema, and simple Q&A can often be handled by models like DeepSeek V3, GPT-4o Mini, Claude Haiku, or Gemini Flash at $0.10–0.80/M input tokens — roughly 10–100× cheaper than frontier models. The key question is not “what is the best model?” but “what is the cheapest model that is good enough?”
Prompt caching changes the calculus. For applications with a large, stable system prompt or a fixed document context, prompt caching can be more impactful than model selection. Anthropic's Claude Sonnet with prompt caching at $0.30/M for cached reads ($3.00/M standard) can undercut the uncached price of cheaper models for repeat-access workloads. Before switching to a cheaper model, calculate whether enabling caching on your current model achieves the same cost reduction.
The hidden costs of cheap models. Lower-priced models sometimes require more prompt engineering, produce more errors requiring retries, or generate outputs that need human review — all of which add real costs not visible in the per-token rate. A model that is 5× cheaper but requires 2× as many retries and 1 hour of extra prompt engineering per week may not be cheaper in practice. The use-case tables above account for token ratios, but quality-adjusted cost requires testing on your actual data.
Batch processing for offline workloads. If your use case does not require real-time responses — document indexing, data enrichment, offline classification — both OpenAI and Anthropic offer batch APIs at 50% off standard pricing. Running GPT-4o Mini via the Batch API at $0.075/M input tokens is among the most cost-effective options available from any major provider, combining frontier-quality instruction following with aggressive pricing for non-real-time workloads.