Cheapest LLM API in 2026

Ranked by input token price — the dominant cost for most workloads. Updated May 2026.

Cheapest Input

Qwen3.5 Flash

$0.01/1M

qwen

Cheapest Output

Llama 3.1 8B

$0.05/1M

All Models Ranked by Input Price

#	Model	Provider	Input /1M	Output /1M	Best For
1	Qwen3.5 Flash	qwen	$0.01	$0.05	Ultra high-volume tasks
2	Llama 3.1 8B	meta	$0.02	$0.05	Budget bulk processing
3	Qwen3 8B	qwen	$0.05	$0.1	Cheapest simple tasks
4	Qwen3 235B	qwen	$0.06	$0.06	Extreme cost efficiency
5	Gemini 2.0 Flash-Lite	google	$0.075	$0.3	Ultra-cheap bulk tasks
6	GPT-4.1 Nano	openai	$0.1	$0.4	Simple 1M context tasks
7	Gemini 2.5 Flash-Lite	google	$0.1	$0.4	Ultra high volume
8	Gemini 2.0 Flash	google	$0.1	$0.4	Fast multimodal tasks
9	Mistral Small 3.1	mistral	$0.1	$0.3	Simple high-volume tasks
10	Mistral Nemo	mistral	$0.1	$0.3	Budget multilingual tasks
11	Qwen3 30B	qwen	$0.1	$0.15	Everyday budget tasks
12	Gemini 3 Flash-Lite	google	$0.12	$0.48	Ultra high-volume Gemini 3
13	GPT-4o Mini	openai	$0.15	$0.6	High-volume chatbots
14	Llama 4 Scout	meta	$0.17	$0.17	Huge context at low cost
15	Llama 3.3 70B	meta	$0.23	$0.4	Open-source workloads
16	DeepSeek Chat	deepseek	$0.27	$1.1	Multilingual tasks
17	Gemini 2.5 Flash	google	$0.3	$2.5	Speed & efficiency
18	Codestral	mistral	$0.3	$0.9	Code generation
19	Grok 3 Mini	xai	$0.3	$0.5	Fast cost-sensitive tasks
20	GPT-4.1 Mini	openai	$0.4	$1.6	Cost-efficient 1M context tasks
21	o3	openai	$0.4	$1.6	Complex reasoning
22	Mistral Medium 3	mistral	$0.4	$2	Budget frontier tasks
23	GPT-3.5 Turbo	openai	$0.5	$1.5	Legacy chat applications
24	Gemini 3 Flash	google	$0.5	$2	Real-time 1M context apps
25	Mistral Large 3	mistral	$0.5	$1.5	European data compliance
26	Llama 4 Maverick	meta	$0.5	$1.1	Balanced open-source tasks
27	DeepSeek R1	deepseek	$0.55	$2.19	Math & logic reasoning
28	GPT-5 Mini	openai	$0.6	$2.4	Cost-efficient frontier tasks
29	Claude 3.5 Haiku	anthropic	$0.8	$4	Fast low-cost tasks
30	DeepSeek R2	deepseek	$0.8	$3.2	Advanced reasoning at low cost
31	Claude Haiku 4.5	anthropic	$1	$5	Fast responses
32	o4-mini	openai	$1.1	$4.4	STEM & coding
33	Gemini 2.5 Pro	google	$1.25	$10	Long-context & multimodal
34	Gemini 1.5 Pro	google	$1.25	$5	Massive document analysis
35	GPT-4.1	openai	$2	$8	Coding & instruction following
36	Magistral Medium	mistral	$2	$5	Math & analysis
37	GPT-4o	openai	$2.5	$10	Multimodal tasks
38	Claude Sonnet 4.6	anthropic	$3	$15	Balanced performance
39	Claude 3.5 Sonnet	anthropic	$3	$15	Production workloads
40	Grok 3	xai	$3	$15	Real-time web research
41	Gemini 3 Pro	google	$3.5	$14	Production reasoning at scale
42	Llama 3.1 405B	meta	$3.5	$3.5	Max open-source performance
43	Claude Opus 4.8	anthropic	$5	$25	Complex reasoning & coding
44	Claude Opus 4.7	anthropic	$5	$25	Agentic workflows
45	Claude Opus 4.6	anthropic	$5	$25	Complex reasoning
46	Grok 3 Fast	xai	$5	$25	Low-latency flagship tasks
47	GPT-5	openai	$8	$32	Frontier tasks & agents
48	GPT-4 Turbo	openai	$10	$30	Legacy GPT-4 workloads
49	Claude Fable 5	anthropic	$10	$50	Frontier reasoning & agents
50	Gemini 3 Ultra	google	$10	$30	Frontier reasoning & multimodal
51	o1	openai	$15	$60	Hard reasoning problems
52	Claude Opus 4.5	anthropic	$15	$75	Complex reasoning & coding
53	Claude 3 Opus	anthropic	$15	$75	Legacy complex workloads

Cheapest Model By Use Case

High-volume chatbot

800 in / 300 out / 50,000 req per day

Qwen3.5 Flash

$34.50/mo

RAG / Q&A

3,500 in / 500 out / 10,000 req per day

Qwen3.5 Flash

$18.00/mo

Batch classification

600 in / 100 out / 100,000 req per day

Qwen3.5 Flash

$33.00/mo

Also see: Best value LLM → | 1M token costs → | Compare all models →

How to Find the Cheapest LLM for Your Actual Workload

The model with the lowest input price per million tokens is rarely the cheapest model for your specific workload. The reason: output tokens cost 3–5× more than input tokens, and the ratio varies significantly across models. A model with $0.05/M input but $0.25/M output will cost more than a model with $0.10/M input and $0.10/M output for any workload that generates substantial output. Always calculate total cost using your real input-to-output ratio.

Small models for simple tasks. The most cost-effective strategy is to route simple, well-defined tasks to the smallest capable model. Customer service responses, content classification, data extraction with a fixed schema, and simple Q&A can often be handled by models like DeepSeek V3, GPT-4o Mini, Claude Haiku, or Gemini Flash at $0.10–0.80/M input tokens — roughly 10–100× cheaper than frontier models. The key question is not “what is the best model?” but “what is the cheapest model that is good enough?”

Prompt caching changes the calculus. For applications with a large, stable system prompt or a fixed document context, prompt caching can be more impactful than model selection. Anthropic's Claude Sonnet with prompt caching at $0.30/M for cached reads ($3.00/M standard) can undercut the uncached price of cheaper models for repeat-access workloads. Before switching to a cheaper model, calculate whether enabling caching on your current model achieves the same cost reduction.

The hidden costs of cheap models. Lower-priced models sometimes require more prompt engineering, produce more errors requiring retries, or generate outputs that need human review — all of which add real costs not visible in the per-token rate. A model that is 5× cheaper but requires 2× as many retries and 1 hour of extra prompt engineering per week may not be cheaper in practice. The use-case tables above account for token ratios, but quality-adjusted cost requires testing on your actual data.

Batch processing for offline workloads. If your use case does not require real-time responses — document indexing, data enrichment, offline classification — both OpenAI and Anthropic offer batch APIs at 50% off standard pricing. Running GPT-4o Mini via the Batch API at $0.075/M input tokens is among the most cost-effective options available from any major provider, combining frontier-quality instruction following with aggressive pricing for non-real-time workloads.