Anthropic Claude API Pricing
Updated June 2026 · 10 models
Complete pricing for all Anthropic Claude API models. Claude supports prompt caching — save up to 90% on repeated prompts for RAG, agents, and long conversations.
Compare Anthropic vs Other Providers
Understanding Anthropic Claude API Pricing
Anthropic structures its Claude lineup around three capability tiers: Haiku (fast, cheap), Sonnet (balanced), and Opus (most capable). Each tier is roughly 3–5× the price of the tier below, which mirrors the capability gap. Claude Haiku 4.5 at $0.80/M input is designed for high-volume, latency-sensitive applications. Claude Opus is reserved for tasks where quality justifies the cost — complex research, nuanced analysis, difficult coding.
Prompt caching is Anthropic's biggest cost lever. Claude supports explicit prompt caching, where you mark specific parts of your prompt with cache control headers. The first request incurs a cache write cost ($3.75/M tokens for Sonnet) but subsequent requests that hit the same cache pay only $0.30/M — a 90% reduction from the $3.00/M standard rate. Cache entries persist for 5 minutes and reset on each hit. For a RAG pipeline injecting a 50,000-token document corpus on every request, caching that corpus pays for itself after just 2 requests.
When to use each Claude model. Claude Haiku handles customer service bots, document classification, content moderation, and any task where you need sub-second responses at scale. Claude Sonnet is the workhorse for most production AI features — code generation, content writing, data extraction, and complex Q&A. Claude Opus is best when you need the highest reasoning quality and cost is secondary: legal analysis, scientific research, complex multi-step coding projects, or evaluating outputs from other models.
Context window pricing considerations. Claude Opus supports a 200,000-token context window. Processing a 150,000-token document costs $2.25 in input tokens alone (at $15/M). If you process that same document 100 times per day uncached, you spend $225/day just on input. With prompt caching — writing once, reading 99 times — the cost drops to $3.75 (write) + $2.97 (99 cached reads at $0.30/M) = $6.72/day, a 97% reduction. For document-heavy workflows, prompt caching is not optional — it is the difference between a viable product and an unscalable one.
Batch API for 50% savings. Like OpenAI, Anthropic offers a Message Batches API at 50% off standard pricing for non-real-time workloads. Batch requests complete within 24 hours and support all Claude models. Combined with prompt caching, a well-optimized Claude pipeline for offline document analysis can run at roughly 5–10% of what a naive first implementation would cost — making frontier-quality AI economically viable even for large-scale data pipelines.