Cheapest LLM API in 2026

Ranked by input token price — the dominant cost for most workloads. Updated May 2026.

Cheapest Input
Qwen3.5 Flash
$0.01/1M
qwen
Cheapest Output
Llama 3.1 8B
$0.05/1M
meta

All Models Ranked by Input Price

#ModelProviderInput /1MOutput /1MBest For
1Qwen3.5 Flashqwen$0.01$0.05Ultra high-volume tasks
2Llama 3.1 8Bmeta$0.02$0.05Budget bulk processing
3Qwen3 8Bqwen$0.05$0.1Cheapest simple tasks
4Qwen3 235Bqwen$0.06$0.06Extreme cost efficiency
5Gemini 2.0 Flash-Litegoogle$0.075$0.3Ultra-cheap bulk tasks
6GPT-4.1 Nanoopenai$0.1$0.4Simple 1M context tasks
7Gemini 2.5 Flash-Litegoogle$0.1$0.4Ultra high volume
8Gemini 2.0 Flashgoogle$0.1$0.4Fast multimodal tasks
9Mistral Small 3.1mistral$0.1$0.3Simple high-volume tasks
10Mistral Nemomistral$0.1$0.3Budget multilingual tasks
11Qwen3 30Bqwen$0.1$0.15Everyday budget tasks
12Gemini 3 Flash-Litegoogle$0.12$0.48Ultra high-volume Gemini 3
13GPT-4o Miniopenai$0.15$0.6High-volume chatbots
14Llama 4 Scoutmeta$0.17$0.17Huge context at low cost
15Llama 3.3 70Bmeta$0.23$0.4Open-source workloads
16DeepSeek Chatdeepseek$0.27$1.1Multilingual tasks
17Gemini 2.5 Flashgoogle$0.3$2.5Speed & efficiency
18Codestralmistral$0.3$0.9Code generation
19Grok 3 Minixai$0.3$0.5Fast cost-sensitive tasks
20GPT-4.1 Miniopenai$0.4$1.6Cost-efficient 1M context tasks
21o3openai$0.4$1.6Complex reasoning
22Mistral Medium 3mistral$0.4$2Budget frontier tasks
23GPT-3.5 Turboopenai$0.5$1.5Legacy chat applications
24Gemini 3 Flashgoogle$0.5$2Real-time 1M context apps
25Mistral Large 3mistral$0.5$1.5European data compliance
26Llama 4 Maverickmeta$0.5$1.1Balanced open-source tasks
27DeepSeek R1deepseek$0.55$2.19Math & logic reasoning
28GPT-5 Miniopenai$0.6$2.4Cost-efficient frontier tasks
29Claude 3.5 Haikuanthropic$0.8$4Fast low-cost tasks
30DeepSeek R2deepseek$0.8$3.2Advanced reasoning at low cost
31Claude Haiku 4.5anthropic$1$5Fast responses
32o4-miniopenai$1.1$4.4STEM & coding
33Gemini 2.5 Progoogle$1.25$10Long-context & multimodal
34Gemini 1.5 Progoogle$1.25$5Massive document analysis
35GPT-4.1openai$2$8Coding & instruction following
36Magistral Mediummistral$2$5Math & analysis
37GPT-4oopenai$2.5$10Multimodal tasks
38Claude Sonnet 4.6anthropic$3$15Balanced performance
39Claude 3.5 Sonnetanthropic$3$15Production workloads
40Grok 3xai$3$15Real-time web research
41Gemini 3 Progoogle$3.5$14Production reasoning at scale
42Llama 3.1 405Bmeta$3.5$3.5Max open-source performance
43Claude Opus 4.8anthropic$5$25Complex reasoning & coding
44Claude Opus 4.7anthropic$5$25Agentic workflows
45Claude Opus 4.6anthropic$5$25Complex reasoning
46Grok 3 Fastxai$5$25Low-latency flagship tasks
47GPT-5openai$8$32Frontier tasks & agents
48GPT-4 Turboopenai$10$30Legacy GPT-4 workloads
49Claude Fable 5anthropic$10$50Frontier reasoning & agents
50Gemini 3 Ultragoogle$10$30Frontier reasoning & multimodal
51o1openai$15$60Hard reasoning problems
52Claude Opus 4.5anthropic$15$75Complex reasoning & coding
53Claude 3 Opusanthropic$15$75Legacy complex workloads

Cheapest Model By Use Case

High-volume chatbot
800 in / 300 out / 50,000 req per day
Qwen3.5 Flash
$34.50/mo
RAG / Q&A
3,500 in / 500 out / 10,000 req per day
Qwen3.5 Flash
$18.00/mo
Batch classification
600 in / 100 out / 100,000 req per day
Qwen3.5 Flash
$33.00/mo

Also see: Best value LLM → | 1M token costs → | Compare all models →

How to Find the Cheapest LLM for Your Actual Workload

The model with the lowest input price per million tokens is rarely the cheapest model for your specific workload. The reason: output tokens cost 3–5× more than input tokens, and the ratio varies significantly across models. A model with $0.05/M input but $0.25/M output will cost more than a model with $0.10/M input and $0.10/M output for any workload that generates substantial output. Always calculate total cost using your real input-to-output ratio.

Small models for simple tasks. The most cost-effective strategy is to route simple, well-defined tasks to the smallest capable model. Customer service responses, content classification, data extraction with a fixed schema, and simple Q&A can often be handled by models like DeepSeek V3, GPT-4o Mini, Claude Haiku, or Gemini Flash at $0.10–0.80/M input tokens — roughly 10–100× cheaper than frontier models. The key question is not “what is the best model?” but “what is the cheapest model that is good enough?”

Prompt caching changes the calculus. For applications with a large, stable system prompt or a fixed document context, prompt caching can be more impactful than model selection. Anthropic's Claude Sonnet with prompt caching at $0.30/M for cached reads ($3.00/M standard) can undercut the uncached price of cheaper models for repeat-access workloads. Before switching to a cheaper model, calculate whether enabling caching on your current model achieves the same cost reduction.

The hidden costs of cheap models. Lower-priced models sometimes require more prompt engineering, produce more errors requiring retries, or generate outputs that need human review — all of which add real costs not visible in the per-token rate. A model that is 5× cheaper but requires 2× as many retries and 1 hour of extra prompt engineering per week may not be cheaper in practice. The use-case tables above account for token ratios, but quality-adjusted cost requires testing on your actual data.

Batch processing for offline workloads. If your use case does not require real-time responses — document indexing, data enrichment, offline classification — both OpenAI and Anthropic offer batch APIs at 50% off standard pricing. Running GPT-4o Mini via the Batch API at $0.075/M input tokens is among the most cost-effective options available from any major provider, combining frontier-quality instruction following with aggressive pricing for non-real-time workloads.