Google Gemini API Pricing
Updated May 2026 · 10 models
Complete pricing for Google Gemini API models. Gemini 2.5 Pro and Flash offer up to 1 million token context windows — ideal for long documents and large codebases.
Compare Google vs Other Providers
Understanding Google Gemini API Pricing
Google's Gemini lineup is distinguished by its massive context windows and tiered pricing model. Gemini 2.5 Pro supports a 1-million-token context window — fitting entire codebases, lengthy legal documents, or multiple book-length texts in a single prompt. Gemini 2.5 Flash offers the same 1M context at a fraction of the cost, making it the most cost-effective option for tasks that require large context but not frontier reasoning depth.
Context window pricing tiers. Gemini 2.5 Pro uses a two-tier pricing model based on context length. Prompts up to 128,000 tokens cost $1.25/M input tokens. Prompts from 128K to 1M tokens cost $2.50/M — doubling when you exceed the first threshold. This incentivizes keeping prompts compact when possible, but the ability to process a full 1M tokens in one shot eliminates the chunking complexity that plagues other models' long-document workflows.
Google AI Studio free tier. Unlike OpenAI and Anthropic, Google offers a generous free tier through Google AI Studio — Gemini 2.5 Flash is free for up to 1,500 requests per day during prototyping. This makes Google's API particularly attractive for early-stage applications or developers evaluating LLM options before committing to a paid plan. The free tier uses the same model as the paid API, with rate limits as the only constraint.
Prompt caching at 75% discount. Gemini supports context caching, where you can cache a large context (minimum 32,768 tokens) and reuse it across multiple requests at a 75% discount. For a 500,000-token document corpus, standard pricing would be $0.625 per request (Gemini 2.5 Flash at $0.125/M). With caching, subsequent requests cost $0.16 per cache read — a 75% reduction. Cache storage costs $1.00/M tokens per hour, so the break-even point depends on request frequency.
When to choose Gemini over GPT or Claude. Gemini 2.5 Flash is the strongest value proposition in its class for tasks requiring large context at low cost — processing long contracts, analyzing entire repositories, or summarizing multi-document research. Gemini 2.5 Pro is competitive with GPT-4o and Claude Sonnet for general tasks while matching their context-length capabilities. For workloads that frequently require processing 100K+ tokens, Gemini's 1M context window without context compression is a significant architectural advantage over alternatives with 8K–32K practical limits.