Anthropic Claude API Pricing

Updated June 2026 · 10 models

Complete pricing for all Anthropic Claude API models. Claude supports prompt caching — save up to 90% on repeated prompts for RAG, agents, and long conversations.

Prompt Caching Available

All Claude models support prompt caching. Cached tokens cost 50–90% less — ideal for RAG pipelines, agents, and conversations with long system prompts. New: Claude Fable 5 and Opus 4.8 launched June 2026.

Claude Fable 5

Anthropic's most capable widely released model — adaptive thinking always on, for the most demanding reasoning and long-horizon agentic work

$10

input /1M

cached /1M

$50

output /1M

Claude Opus 4.8

Most capable Opus-tier model — complex reasoning, long-horizon agentic coding, and high-autonomy work with adaptive thinking

input /1M

$0.5

cached /1M

$25

output /1M

Claude Opus 4.7

Most capable Claude model — step-change improvement in agentic coding

input /1M

$0.5

cached /1M

$25

output /1M

Claude Sonnet 4.6

Optimal balance of intelligence, cost, and speed with 1M context

input /1M

$0.3

cached /1M

$15

output /1M

Claude Opus 4.6

Previous flagship — strong reasoning and extended thinking support

input /1M

$25

output /1M

Claude Opus 4.5

Most capable Claude 4 model with extended thinking — top performance on complex reasoning and coding

$15

input /1M

$1.5

cached /1M

$75

output /1M

Claude Haiku 4.5

Fastest and most cost-efficient Claude with near-frontier intelligence

input /1M

$0.1

cached /1M

output /1M

Claude 3.5 Sonnet

Previous generation Sonnet — high intelligence at moderate cost, widely used in production

input /1M

$0.3

cached /1M

$15

output /1M

Claude 3.5 Haiku

Previous generation fast model — great balance of speed and intelligence at low cost

$0.8

input /1M

$0.08

cached /1M

output /1M

Claude 3 Opus

Third-generation flagship — powerful reasoning, still used for demanding legacy workloads

$15

input /1M

$1.5

cached /1M

$75

output /1M

Compare Anthropic vs Other Providers

vs OpenAI GPT vs Google Gemini Claude Sonnet vs GPT-4o Claude Opus vs GPT-4.1 Claude Haiku vs GPT-4o Mini

Understanding Anthropic Claude API Pricing

Anthropic structures its Claude lineup around three capability tiers: Haiku (fast, cheap), Sonnet (balanced), and Opus (most capable). Each tier is roughly 3–5× the price of the tier below, which mirrors the capability gap. Claude Haiku 4.5 at $0.80/M input is designed for high-volume, latency-sensitive applications. Claude Opus is reserved for tasks where quality justifies the cost — complex research, nuanced analysis, difficult coding.

Prompt caching is Anthropic's biggest cost lever. Claude supports explicit prompt caching, where you mark specific parts of your prompt with cache control headers. The first request incurs a cache write cost ($3.75/M tokens for Sonnet) but subsequent requests that hit the same cache pay only $0.30/M — a 90% reduction from the $3.00/M standard rate. Cache entries persist for 5 minutes and reset on each hit. For a RAG pipeline injecting a 50,000-token document corpus on every request, caching that corpus pays for itself after just 2 requests.

When to use each Claude model. Claude Haiku handles customer service bots, document classification, content moderation, and any task where you need sub-second responses at scale. Claude Sonnet is the workhorse for most production AI features — code generation, content writing, data extraction, and complex Q&A. Claude Opus is best when you need the highest reasoning quality and cost is secondary: legal analysis, scientific research, complex multi-step coding projects, or evaluating outputs from other models.

Context window pricing considerations. Claude Opus supports a 200,000-token context window. Processing a 150,000-token document costs $2.25 in input tokens alone (at $15/M). If you process that same document 100 times per day uncached, you spend $225/day just on input. With prompt caching — writing once, reading 99 times — the cost drops to $3.75 (write) + $2.97 (99 cached reads at $0.30/M) = $6.72/day, a 97% reduction. For document-heavy workflows, prompt caching is not optional — it is the difference between a viable product and an unscalable one.

Batch API for 50% savings. Like OpenAI, Anthropic offers a Message Batches API at 50% off standard pricing for non-real-time workloads. Batch requests complete within 24 hours and support all Claude models. Combined with prompt caching, a well-optimized Claude pipeline for offline document analysis can run at roughly 5–10% of what a naive first implementation would cost — making frontier-quality AI economically viable even for large-scale data pipelines.