📚
RAG Pipeline Cost Calculator
Calculate costs for Retrieval-Augmented Generation pipelines. Accounts for large context windows needed to pass retrieved documents.
Typical Token Breakdown
200
System prompt
RAG instructions and output format guidance
2,500
Retrieved chunks
3-5 document chunks of ~500 tokens each from vector search
100
User query
Original user question
200
Metadata / citations
Source documents, page numbers, and timestamps
Total: ~4,000 tokens per request (3,500 input + 500 output)
Cost Calculator
Recommended Models
Gemini 2.5 ProBest for Large Docs
1M context fits entire document collections; cached pricing reduces repeat costs
$1,406.25
/month
Claude Sonnet 4.6Best Accuracy
Superior instruction following and citation accuracy with 1M context
$2,700.00
/month
GPT-4.1 MiniBest Value
1M context at $0.40/1M input — cheapest frontier context option
$330.00
/month
Cost Optimization Tips
⚡Cache your system prompt + static document chunks — RAG pipelines see 60-80% cache hit rates
🔍Use a small model (Haiku, GPT-4o Mini) to pre-filter irrelevant retrieved chunks before sending to flagship model
📏Limit retrieved chunks to 3 most relevant — going from 5 to 3 chunks cuts input tokens by 40%
🗜️Compress chunk text with an embedding-aligned summarizer to reduce token count by 30-50%