📚

RAG Pipeline Cost Calculator

Calculate costs for Retrieval-Augmented Generation pipelines. Accounts for large context windows needed to pass retrieved documents.

Typical Token Breakdown

200

System prompt

RAG instructions and output format guidance

2,500

Retrieved chunks

3-5 document chunks of ~500 tokens each from vector search

100

User query

Original user question

200

Metadata / citations

Source documents, page numbers, and timestamps

Total: ~4,000 tokens per request (3,500 input + 500 output)

Model

Input tokens / request

Output tokens / request

Requests / day

Per request$0.009375

Daily$46.88

Monthly$1,406.25

Yearly$17,109.38

Gemini 2.5 ProBest for Large Docs

1M context fits entire document collections; cached pricing reduces repeat costs

$1,406.25

/month

Claude Sonnet 4.6Best Accuracy

Superior instruction following and citation accuracy with 1M context

$2,700.00

/month

GPT-4.1 MiniBest Value

1M context at $0.40/1M input — cheapest frontier context option

$330.00

/month

⚡Cache your system prompt + static document chunks — RAG pipelines see 60-80% cache hit rates

🔍Use a small model (Haiku, GPT-4o Mini) to pre-filter irrelevant retrieved chunks before sending to flagship model

📏Limit retrieved chunks to 3 most relevant — going from 5 to 3 chunks cuts input tokens by 40%

🗜️Compress chunk text with an embedding-aligned summarizer to reduce token count by 30-50%