Gemini Pricing in 2026: Gemini API vs Vertex AI (Tokens, Batch, Caching, Imagen, Veo)

Gemini Pricing in 2026: Gemini API vs Vertex AI (Tokens, Batch, Caching, Imagen, Veo)

Last updated: January 2026

Google offers two main ways to use Gemini and related generative models in production:
the Gemini API (Developer API via Google AI Studio) and
Vertex AI (Google Cloud).
They share many model families, but pricing, quotas, and “grounding” billing can differ,
so it helps to understand the moving parts before you budget.


Table of contents


1) Two pricing surfaces: Gemini API vs Vertex AI

Gemini API (Developer API)

  • Built for developers and product teams shipping apps quickly.
  • Pricing is published as USD per 1M tokens (plus some per-image / per-second SKUs).
  • Includes features like Batch API (50% cost reduction) and context caching on the paid tier for supported models.

Vertex AI (Google Cloud)

  • Enterprise-grade deployment on Google Cloud with Cloud billing and SKUs.
  • Many Gemini model prices are also listed per 1M tokens, with batch discounts for Gemini models.
  • Additional notes apply, like how long-context thresholds are billed, currency conversion via Cloud SKUs, and how grounding is charged.

Practical rule: if you need enterprise compliance, VPC controls, org-level billing, or deep Cloud integration, Vertex AI is usually the default.
If you want a lighter-weight path from prototype to production, the Gemini API is often simpler.


2) Token billing basics (input, output, thinking, long context)

Most Gemini model pricing is token-based:

  • Input tokens: everything you send (system instructions, user messages, retrieved context).
  • Output tokens: everything the model returns.
  • Thinking / reasoning tokens: some models explicitly note that output pricing includes thinking tokens.
  • Long context thresholds: several pricing tables split pricing at a context threshold (for example, <= 200K vs > 200K input tokens). If you cross it, pricing can change for the whole request.

A simple estimator:

estimated_cost_usd =
  (input_tokens / 1_000_000) * input_rate
+ (cached_input_tokens / 1_000_000) * cached_input_rate
+ (output_tokens / 1_000_000) * output_rate
+ tool_costs (if any)

3) Free vs Paid vs Enterprise on the Gemini API

The Gemini API describes three tiers:

  • Free: generous limits for getting started, but limited access to certain models. Content can be used to improve products.
  • Paid: pay-as-you-go production usage with higher rate limits, context caching, Batch API, and access to more advanced models. Content is not used to improve products.
  • Enterprise: large-scale deployments powered by Vertex AI (plus enterprise security, compliance, support, and throughput options).

4) Batch and context caching: the big levers

Batch

Batch pricing is typically shown as roughly 50% cheaper than standard for eligible Gemini model calls.
Use Batch for offline jobs: backfills, nightly processing, bulk summarization, eval runs, and re-indexing.

Context caching

Context caching is a cost optimization for repeated prompt prefixes (long system prompts, shared policies, static instructions).
Many Gemini API tables show:

  • a cached input token rate, and
  • a storage price per hour for cached tokens (on some Gemini API tables).

Practical rule: caching helps most when you reuse a large, stable prefix across many requests.


5) Gemini API pricing highlights (Developer API)

Units: USD per 1M tokens unless noted. Some models split rates at <= 200K vs > 200K prompts.

Top text models (selected)

Model Standard input Standard output Batch input Batch output Notes
gemini-3-pro-preview $2.00 (<=200K) / $4.00 (>200K) $12.00 (<=200K) / $18.00 (>200K)
(includes thinking tokens)
$1.00 (<=200K) / $2.00 (>200K) $6.00 (<=200K) / $9.00 (>200K) Context caching: $0.20 / $0.40 + storage $4.50 per 1M tokens per hour
gemini-3-flash-preview $0.50 (text/image/video) / $1.00 (audio) $3.00
(includes thinking tokens)
$0.25 (text/image/video) / $0.50 (audio) $1.50 Context caching: $0.05 (text/image/video) / $0.10 (audio) + storage $1.00 per 1M tokens per hour
gemini-2.5-pro $1.25 (<=200K) / $2.50 (>200K) $10.00 (<=200K) / $15.00 (>200K)
(includes thinking tokens)
$0.625 (<=200K) / $1.25 (>200K) $5.00 (<=200K) / $7.50 (>200K) Context caching: $0.125 / $0.25 + storage $4.50 per 1M tokens per hour
gemini-2.5-flash $0.30 (text/image/video) / $1.00 (audio) $2.50
(includes thinking tokens)
$0.15 (text/image/video) / $0.50 (audio) $1.25 Context caching: $0.03 (text/image/video) / $0.10 (audio) + storage $1.00 per 1M tokens per hour
gemini-2.5-flash-lite $0.10 (text/image/video) / $0.30 (audio) $0.40
(includes thinking tokens)
$0.05 (text/image/video) / $0.15 (audio) $0.20 Context caching: $0.01 (text/image/video) / $0.03 (audio) + storage $1.00 per 1M tokens per hour
gemini-2.0-flash $0.10 (text/image/video) / $0.70 (audio) $0.40 $0.05 (text/image/video) / $0.35 (audio) $0.20 Shows context caching token rates; includes image generation price per image
gemini-2.0-flash-lite $0.075 $0.30 $0.0375 $0.15 Lowest-cost text option in this snapshot

Embeddings (Gemini API)

Model Standard Batch What is billed
gemini-embedding-001 $0.15 per 1M input tokens $0.075 per 1M input tokens Input tokens

6) Vertex AI pricing highlights (Google Cloud)

Vertex AI lists prices in USD and notes that if you pay in another currency,
your billed price follows the Cloud Platform SKUs for your currency.

Gemini 3 (Vertex AI)

Model Input / 1M (<=200K) Input / 1M (>200K) Cached input / 1M (<=200K) Batch input / 1M (<=200K) Text output / 1M (<=200K) Batch output / 1M (<=200K)
Gemini 3 Pro Preview $2 $4 $0.2 $1 $12
(response and reasoning)
$6
Gemini 3 Flash Preview $0.5 (text/image/video)
$1 (audio)
$0.5 (text/image/video)
$1 (audio)
$0.05 (text/image/video)
$0.1 (audio)
$0.25 (text/image/video)
$0.5 (audio)
$3
(response and reasoning)
$1.5

Vertex AI notes that Gemini 3 grounding billing starts January 5, 2026.
It also notes that input tokens provided by grounding results are not charged (grounding fees are separate).

Gemini 2.5 (Vertex AI, selected)

Model Input / 1M (<=200K) Cached input / 1M (<=200K) Batch input / 1M (<=200K) Text output / 1M (<=200K) Batch output / 1M (<=200K)
Gemini 2.5 Pro $1.25 $0.125 $0.625 $10 (response and reasoning) $5
Gemini 2.5 Flash $0.30 (text/image/video)
$1 (audio)
$0.03 (text/image/video)
$0.10 (audio)
$0.15 (text/image/video)
$0.5 (audio)
$2.50 (response and reasoning) $1.25
Gemini 2.5 Flash Lite $0.10 (text/image/video)
$0.3 (audio)
$0.01 (text/image/video)
$0.03 (audio)
$0.05 (text/image/video)
$0.15 (audio)
$0.4 (response and reasoning) $0.2

Gemini 2.0 (Vertex AI, token-based snapshot)

Model Input / 1M Output / 1M Batch input / 1M Batch output / 1M
Gemini 2.0 Flash $0.15 (input)
$1.00 (input audio)
$0.60 (output text) $0.075 (input)
$0.50 (input audio)
$0.30 (output text)
Gemini 2.0 Flash Lite $0.075 (input) $0.30 (output text) $0.0375 (input) $0.15 (output text)

7) Media pricing: Imagen, Veo, Lyria

Imagen (Gemini API, per image)

Model Price Unit
Imagen 4 Fast $0.02 per image
Imagen 4 Standard $0.04 per image
Imagen 4 Ultra $0.06 per image
Imagen 3 $0.03 per image

Veo (Gemini API, per second)

Model Price Unit
Veo 3.1 Standard video with audio $0.40 per second
Veo 3.1 Fast video with audio $0.15 per second
Veo 3 Standard video with audio $0.40 per second
Veo 3 Fast video with audio $0.15 per second
Veo 2 $0.35 per second

Note: the Gemini API pricing page notes you are only charged if a video is successfully generated.

Vertex AI media snapshots

Vertex AI lists Imagen pricing that aligns with the Imagen 4 tiers (Ultra, Standard, Fast), plus additional features like upscaling and specialized capabilities.
Vertex AI also lists Veo 3.1 and Veo 3 price points for video and video+audio, and Lyria 2 for music generation (priced per 30 seconds).


8) Embeddings pricing

Gemini API embeddings

  • gemini-embedding-001: $0.15 per 1M input tokens (Standard), $0.075 per 1M input tokens (Batch)

Vertex AI embeddings (high-level)

  • Vertex AI lists Gemini Embedding as a price per 1,000 input tokens (online and batch), with no charge for output.
  • It also lists non-Gemini embedding SKUs priced per 1,000 characters, plus multimodal embedding pricing for image and video inputs.

9) Grounding (Google Search, Maps): how it is billed

Gemini API grounding

  • For several Gemini models, Google Search and Google Maps grounding includes daily free quotas (RPD) and then bills per 1,000 grounded prompts.
  • For Gemini 3 models, Search grounding is shown as billing per 1,000 search queries, with billing starting January 5, 2026.
  • Tools pricing also notes Code execution is free, and URL context is charged as input tokens based on the model’s rates.

Vertex AI grounding

  • Vertex AI lists Gemini 3 grounding with monthly included search queries and bills overages per 1,000 queries, with billing starting January 5, 2026.
  • For Gemini 2.5 and 2.0, Vertex AI describes “grounded prompts” and clarifies how they are charged (including cases where multiple search queries still count as one grounded prompt charge).

10) Cost examples

Example A: One request on gemini-2.5-flash (Gemini API, Standard)

Assume:

  • Input tokens: 2,000
  • Output tokens: 600
  • Rates: input $0.30 / 1M, output $2.50 / 1M
Input:  2,000 / 1,000,000 * $0.30 = $0.00060
Output:   600 / 1,000,000 * $2.50 = $0.00150
Total:                          $0.00210

Example B: Same request on gemini-2.5-flash (Gemini API, Batch)

Batch rates in the table are $0.15 input and $1.25 output per 1M.

Input:  2,000 / 1,000,000 * $0.15 = $0.00030
Output:   600 / 1,000,000 * $1.25 = $0.00075
Total:                          $0.00105

This is roughly half the Standard cost, which is why Batch is so powerful for offline work.

Example C: Estimating Imagen 4 budget (Gemini API)

If you generate 10,000 images with Imagen 4 Standard at $0.04 per image:

10,000 * $0.04 = $400

11) Cost optimization checklist

  • Pick the smallest model that meets quality: Flash and Flash-Lite are designed for scale and cost efficiency.
  • Use Batch where users are not waiting: backfills, large processing, offline evals.
  • Cap output: long answers and heavy reasoning raise output tokens fast.
  • Cache repeated prefixes: policies, system prompts, static instructions, large boilerplate.
  • Watch grounding costs: Search and Maps can add separate charges beyond model tokens.
  • Measure token usage with real traffic: estimates are useful, but real prompts determine your bill.

12) FAQ

Do Gemini API and Vertex AI always have the same token rates?

Not always. Many rates are similar, but tables can differ by platform and model generation.
Always budget from the pricing table for the platform you will actually deploy on.

What is the easiest way to cut costs fast?

In most apps, the biggest wins come from (1) using Batch for offline work, (2) switching to Flash or Flash-Lite,
(3) reducing output length, and (4) caching repeated context.

When does Gemini 3 grounding billing start?

The pricing tables explicitly note that billing for Gemini 3 Grounding with Google Search starts on January 5, 2026.


13) References

  1. Gemini API pricing (Google AI for Developers) (Accessed: January 2026)
  2. Vertex AI Generative AI pricing (Google Cloud) (Accessed: January 2026)

Author update

Pricing changes quickly. I will keep this post updated with new rates and break-even examples. If you want a custom scenario modeled, share your volumes and constraints.

Leave a Reply

Your email address will not be published. Required fields are marked *