Gemini Pricing in 2026: Gemini API vs Vertex AI (Tokens, Batch, Caching, Imagen, Veo)
Gemini Pricing in 2026: Gemini API vs Vertex AI (Tokens, Batch, Caching, Imagen, Veo)
Last updated: January 2026
Google offers two main ways to use Gemini and related generative models in production:
the Gemini API (Developer API via Google AI Studio) and
Vertex AI (Google Cloud).
They share many model families, but pricing, quotas, and “grounding” billing can differ,
so it helps to understand the moving parts before you budget.
Table of contents
- 1) Two pricing surfaces: Gemini API vs Vertex AI
- 2) Token billing basics (input, output, thinking, long context)
- 3) Free vs Paid vs Enterprise on the Gemini API
- 4) Batch and context caching: the big levers
- 5) Gemini API pricing highlights (Developer API)
- 6) Vertex AI pricing highlights (Google Cloud)
- 7) Media pricing: Imagen, Veo, Lyria
- 8) Embeddings pricing
- 9) Grounding (Google Search, Maps): how it is billed
- 10) Cost examples
- 11) Cost optimization checklist
- 12) FAQ
- 13) References
1) Two pricing surfaces: Gemini API vs Vertex AI
Gemini API (Developer API)
- Built for developers and product teams shipping apps quickly.
- Pricing is published as USD per 1M tokens (plus some per-image / per-second SKUs).
- Includes features like Batch API (50% cost reduction) and context caching on the paid tier for supported models.
Vertex AI (Google Cloud)
- Enterprise-grade deployment on Google Cloud with Cloud billing and SKUs.
- Many Gemini model prices are also listed per 1M tokens, with batch discounts for Gemini models.
- Additional notes apply, like how long-context thresholds are billed, currency conversion via Cloud SKUs, and how grounding is charged.
Practical rule: if you need enterprise compliance, VPC controls, org-level billing, or deep Cloud integration, Vertex AI is usually the default.
If you want a lighter-weight path from prototype to production, the Gemini API is often simpler.
2) Token billing basics (input, output, thinking, long context)
Most Gemini model pricing is token-based:
- Input tokens: everything you send (system instructions, user messages, retrieved context).
- Output tokens: everything the model returns.
- Thinking / reasoning tokens: some models explicitly note that output pricing includes thinking tokens.
- Long context thresholds: several pricing tables split pricing at a context threshold (for example, <= 200K vs > 200K input tokens). If you cross it, pricing can change for the whole request.
A simple estimator:
estimated_cost_usd =
(input_tokens / 1_000_000) * input_rate
+ (cached_input_tokens / 1_000_000) * cached_input_rate
+ (output_tokens / 1_000_000) * output_rate
+ tool_costs (if any)
3) Free vs Paid vs Enterprise on the Gemini API
The Gemini API describes three tiers:
- Free: generous limits for getting started, but limited access to certain models. Content can be used to improve products.
- Paid: pay-as-you-go production usage with higher rate limits, context caching, Batch API, and access to more advanced models. Content is not used to improve products.
- Enterprise: large-scale deployments powered by Vertex AI (plus enterprise security, compliance, support, and throughput options).
4) Batch and context caching: the big levers
Batch
Batch pricing is typically shown as roughly 50% cheaper than standard for eligible Gemini model calls.
Use Batch for offline jobs: backfills, nightly processing, bulk summarization, eval runs, and re-indexing.
Context caching
Context caching is a cost optimization for repeated prompt prefixes (long system prompts, shared policies, static instructions).
Many Gemini API tables show:
- a cached input token rate, and
- a storage price per hour for cached tokens (on some Gemini API tables).
Practical rule: caching helps most when you reuse a large, stable prefix across many requests.
5) Gemini API pricing highlights (Developer API)
Units: USD per 1M tokens unless noted. Some models split rates at <= 200K vs > 200K prompts.
Top text models (selected)
| Model | Standard input | Standard output | Batch input | Batch output | Notes |
|---|---|---|---|---|---|
| gemini-3-pro-preview | $2.00 (<=200K) / $4.00 (>200K) | $12.00 (<=200K) / $18.00 (>200K) (includes thinking tokens) |
$1.00 (<=200K) / $2.00 (>200K) | $6.00 (<=200K) / $9.00 (>200K) | Context caching: $0.20 / $0.40 + storage $4.50 per 1M tokens per hour |
| gemini-3-flash-preview | $0.50 (text/image/video) / $1.00 (audio) | $3.00 (includes thinking tokens) |
$0.25 (text/image/video) / $0.50 (audio) | $1.50 | Context caching: $0.05 (text/image/video) / $0.10 (audio) + storage $1.00 per 1M tokens per hour |
| gemini-2.5-pro | $1.25 (<=200K) / $2.50 (>200K) | $10.00 (<=200K) / $15.00 (>200K) (includes thinking tokens) |
$0.625 (<=200K) / $1.25 (>200K) | $5.00 (<=200K) / $7.50 (>200K) | Context caching: $0.125 / $0.25 + storage $4.50 per 1M tokens per hour |
| gemini-2.5-flash | $0.30 (text/image/video) / $1.00 (audio) | $2.50 (includes thinking tokens) |
$0.15 (text/image/video) / $0.50 (audio) | $1.25 | Context caching: $0.03 (text/image/video) / $0.10 (audio) + storage $1.00 per 1M tokens per hour |
| gemini-2.5-flash-lite | $0.10 (text/image/video) / $0.30 (audio) | $0.40 (includes thinking tokens) |
$0.05 (text/image/video) / $0.15 (audio) | $0.20 | Context caching: $0.01 (text/image/video) / $0.03 (audio) + storage $1.00 per 1M tokens per hour |
| gemini-2.0-flash | $0.10 (text/image/video) / $0.70 (audio) | $0.40 | $0.05 (text/image/video) / $0.35 (audio) | $0.20 | Shows context caching token rates; includes image generation price per image |
| gemini-2.0-flash-lite | $0.075 | $0.30 | $0.0375 | $0.15 | Lowest-cost text option in this snapshot |
Embeddings (Gemini API)
| Model | Standard | Batch | What is billed |
|---|---|---|---|
| gemini-embedding-001 | $0.15 per 1M input tokens | $0.075 per 1M input tokens | Input tokens |
6) Vertex AI pricing highlights (Google Cloud)
Vertex AI lists prices in USD and notes that if you pay in another currency,
your billed price follows the Cloud Platform SKUs for your currency.
Gemini 3 (Vertex AI)
| Model | Input / 1M (<=200K) | Input / 1M (>200K) | Cached input / 1M (<=200K) | Batch input / 1M (<=200K) | Text output / 1M (<=200K) | Batch output / 1M (<=200K) |
|---|---|---|---|---|---|---|
| Gemini 3 Pro Preview | $2 | $4 | $0.2 | $1 | $12 (response and reasoning) |
$6 |
| Gemini 3 Flash Preview | $0.5 (text/image/video) $1 (audio) |
$0.5 (text/image/video) $1 (audio) |
$0.05 (text/image/video) $0.1 (audio) |
$0.25 (text/image/video) $0.5 (audio) |
$3 (response and reasoning) |
$1.5 |
Vertex AI notes that Gemini 3 grounding billing starts January 5, 2026.
It also notes that input tokens provided by grounding results are not charged (grounding fees are separate).
Gemini 2.5 (Vertex AI, selected)
| Model | Input / 1M (<=200K) | Cached input / 1M (<=200K) | Batch input / 1M (<=200K) | Text output / 1M (<=200K) | Batch output / 1M (<=200K) |
|---|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $0.125 | $0.625 | $10 (response and reasoning) | $5 |
| Gemini 2.5 Flash | $0.30 (text/image/video) $1 (audio) |
$0.03 (text/image/video) $0.10 (audio) |
$0.15 (text/image/video) $0.5 (audio) |
$2.50 (response and reasoning) | $1.25 |
| Gemini 2.5 Flash Lite | $0.10 (text/image/video) $0.3 (audio) |
$0.01 (text/image/video) $0.03 (audio) |
$0.05 (text/image/video) $0.15 (audio) |
$0.4 (response and reasoning) | $0.2 |
Gemini 2.0 (Vertex AI, token-based snapshot)
| Model | Input / 1M | Output / 1M | Batch input / 1M | Batch output / 1M |
|---|---|---|---|---|
| Gemini 2.0 Flash | $0.15 (input) $1.00 (input audio) |
$0.60 (output text) | $0.075 (input) $0.50 (input audio) |
$0.30 (output text) |
| Gemini 2.0 Flash Lite | $0.075 (input) | $0.30 (output text) | $0.0375 (input) | $0.15 (output text) |
7) Media pricing: Imagen, Veo, Lyria
Imagen (Gemini API, per image)
| Model | Price | Unit |
|---|---|---|
| Imagen 4 Fast | $0.02 | per image |
| Imagen 4 Standard | $0.04 | per image |
| Imagen 4 Ultra | $0.06 | per image |
| Imagen 3 | $0.03 | per image |
Veo (Gemini API, per second)
| Model | Price | Unit |
|---|---|---|
| Veo 3.1 Standard video with audio | $0.40 | per second |
| Veo 3.1 Fast video with audio | $0.15 | per second |
| Veo 3 Standard video with audio | $0.40 | per second |
| Veo 3 Fast video with audio | $0.15 | per second |
| Veo 2 | $0.35 | per second |
Note: the Gemini API pricing page notes you are only charged if a video is successfully generated.
Vertex AI media snapshots
Vertex AI lists Imagen pricing that aligns with the Imagen 4 tiers (Ultra, Standard, Fast), plus additional features like upscaling and specialized capabilities.
Vertex AI also lists Veo 3.1 and Veo 3 price points for video and video+audio, and Lyria 2 for music generation (priced per 30 seconds).
8) Embeddings pricing
Gemini API embeddings
- gemini-embedding-001: $0.15 per 1M input tokens (Standard), $0.075 per 1M input tokens (Batch)
Vertex AI embeddings (high-level)
- Vertex AI lists Gemini Embedding as a price per 1,000 input tokens (online and batch), with no charge for output.
- It also lists non-Gemini embedding SKUs priced per 1,000 characters, plus multimodal embedding pricing for image and video inputs.
9) Grounding (Google Search, Maps): how it is billed
Gemini API grounding
- For several Gemini models, Google Search and Google Maps grounding includes daily free quotas (RPD) and then bills per 1,000 grounded prompts.
- For Gemini 3 models, Search grounding is shown as billing per 1,000 search queries, with billing starting January 5, 2026.
- Tools pricing also notes Code execution is free, and URL context is charged as input tokens based on the model’s rates.
Vertex AI grounding
- Vertex AI lists Gemini 3 grounding with monthly included search queries and bills overages per 1,000 queries, with billing starting January 5, 2026.
- For Gemini 2.5 and 2.0, Vertex AI describes “grounded prompts” and clarifies how they are charged (including cases where multiple search queries still count as one grounded prompt charge).
10) Cost examples
Example A: One request on gemini-2.5-flash (Gemini API, Standard)
Assume:
- Input tokens: 2,000
- Output tokens: 600
- Rates: input $0.30 / 1M, output $2.50 / 1M
Input: 2,000 / 1,000,000 * $0.30 = $0.00060
Output: 600 / 1,000,000 * $2.50 = $0.00150
Total: $0.00210
Example B: Same request on gemini-2.5-flash (Gemini API, Batch)
Batch rates in the table are $0.15 input and $1.25 output per 1M.
Input: 2,000 / 1,000,000 * $0.15 = $0.00030
Output: 600 / 1,000,000 * $1.25 = $0.00075
Total: $0.00105
This is roughly half the Standard cost, which is why Batch is so powerful for offline work.
Example C: Estimating Imagen 4 budget (Gemini API)
If you generate 10,000 images with Imagen 4 Standard at $0.04 per image:
10,000 * $0.04 = $400
11) Cost optimization checklist
- Pick the smallest model that meets quality: Flash and Flash-Lite are designed for scale and cost efficiency.
- Use Batch where users are not waiting: backfills, large processing, offline evals.
- Cap output: long answers and heavy reasoning raise output tokens fast.
- Cache repeated prefixes: policies, system prompts, static instructions, large boilerplate.
- Watch grounding costs: Search and Maps can add separate charges beyond model tokens.
- Measure token usage with real traffic: estimates are useful, but real prompts determine your bill.
12) FAQ
Do Gemini API and Vertex AI always have the same token rates?
Not always. Many rates are similar, but tables can differ by platform and model generation.
Always budget from the pricing table for the platform you will actually deploy on.
What is the easiest way to cut costs fast?
In most apps, the biggest wins come from (1) using Batch for offline work, (2) switching to Flash or Flash-Lite,
(3) reducing output length, and (4) caching repeated context.
When does Gemini 3 grounding billing start?
The pricing tables explicitly note that billing for Gemini 3 Grounding with Google Search starts on January 5, 2026.
13) References
- Gemini API pricing (Google AI for Developers) (Accessed: January 2026)
- Vertex AI Generative AI pricing (Google Cloud) (Accessed: January 2026)
Author update
Pricing changes quickly. I will keep this post updated with new rates and break-even examples. If you want a custom scenario modeled, share your volumes and constraints.

