G

Google Gemini API Pricing

Updated May 2026 · 10 models

Complete pricing for Google Gemini API models. Gemini 2.5 Pro and Flash offer up to 1 million token context windows — ideal for long documents and large codebases.

Industry-Leading Context Window
Gemini 2.5 Pro and Flash support 1M token context — fit entire codebases, books, or document collections in a single prompt. Cached pricing available for repeated context.
Gemini 3 Ultra2M ctx
Google's most powerful model — frontier reasoning, native multimodal, 2M context window
$10
input /1M
$2.5
cached /1M
$30
output /1M
Gemini 3 Pro1M ctx
Gemini 3's balanced model — strong reasoning at a fraction of Ultra cost
$3.5
input /1M
$0.875
cached /1M
$14
output /1M
Gemini 3 Flash1M ctx
Fast and capable Gemini 3 — ideal for real-time applications needing 1M context
$0.5
input /1M
$0.125
cached /1M
$2
output /1M
Gemini 3 Flash-Lite1M ctx
Most affordable Gemini 3 model — high-volume tasks with 1M context at near-zero cost
$0.12
input /1M
$0.48
output /1M
Gemini 2.5 Pro1M ctx
Most capable Gemini model with deep reasoning and multimodal support
$1.25
input /1M
$0.31
cached /1M
$10
output /1M
Gemini 2.5 Flash1M ctx
Best-in-class speed and efficiency for diverse tasks
$0.3
input /1M
$0.075
cached /1M
$2.5
output /1M
Gemini 2.5 Flash-Lite1M ctx
Most cost-efficient Gemini model for high-volume, latency-sensitive workloads
$0.1
input /1M
$0.4
output /1M
Gemini 2.0 Flash1M ctx
Previous gen workhorse — fast multimodal model with excellent price-to-performance
$0.1
input /1M
$0.025
cached /1M
$0.4
output /1M
Gemini 2.0 Flash-Lite1M ctx
Ultra-cheap previous gen model — suitable for high-volume simple generation tasks
$0.075
input /1M
$0.3
output /1M
Gemini 1.5 Pro2M ctx
First model with 2M token context window — great for massive document analysis
$1.25
input /1M
$0.31
cached /1M
$5
output /1M

Compare Google vs Other Providers

vs OpenAI GPTvs Anthropic ClaudeGemini 2.5 Pro vs GPT-4oGemini Flash vs Claude Haiku

Understanding Google Gemini API Pricing

Google's Gemini lineup is distinguished by its massive context windows and tiered pricing model. Gemini 2.5 Pro supports a 1-million-token context window — fitting entire codebases, lengthy legal documents, or multiple book-length texts in a single prompt. Gemini 2.5 Flash offers the same 1M context at a fraction of the cost, making it the most cost-effective option for tasks that require large context but not frontier reasoning depth.

Context window pricing tiers. Gemini 2.5 Pro uses a two-tier pricing model based on context length. Prompts up to 128,000 tokens cost $1.25/M input tokens. Prompts from 128K to 1M tokens cost $2.50/M — doubling when you exceed the first threshold. This incentivizes keeping prompts compact when possible, but the ability to process a full 1M tokens in one shot eliminates the chunking complexity that plagues other models' long-document workflows.

Google AI Studio free tier. Unlike OpenAI and Anthropic, Google offers a generous free tier through Google AI Studio — Gemini 2.5 Flash is free for up to 1,500 requests per day during prototyping. This makes Google's API particularly attractive for early-stage applications or developers evaluating LLM options before committing to a paid plan. The free tier uses the same model as the paid API, with rate limits as the only constraint.

Prompt caching at 75% discount. Gemini supports context caching, where you can cache a large context (minimum 32,768 tokens) and reuse it across multiple requests at a 75% discount. For a 500,000-token document corpus, standard pricing would be $0.625 per request (Gemini 2.5 Flash at $0.125/M). With caching, subsequent requests cost $0.16 per cache read — a 75% reduction. Cache storage costs $1.00/M tokens per hour, so the break-even point depends on request frequency.

When to choose Gemini over GPT or Claude. Gemini 2.5 Flash is the strongest value proposition in its class for tasks requiring large context at low cost — processing long contracts, analyzing entire repositories, or summarizing multi-document research. Gemini 2.5 Pro is competitive with GPT-4o and Claude Sonnet for general tasks while matching their context-length capabilities. For workloads that frequently require processing 100K+ tokens, Gemini's 1M context window without context compression is a significant architectural advantage over alternatives with 8K–32K practical limits.