OpenAI API Pricing in 2026: A Practical Guide (Models, Tokens, Tiers, Tools)

OpenAI API Pricing in 2026: A Practical Guide (Models, Tokens, Tiers, Tools)

Last updated: January 2026

OpenAI’s API pricing can look confusing at first because you are not paying for “an API call”.
You are usually paying for tokens (input and output), plus optional add-ons like
Batch, Priority, image and audio tokenization, and some built-in tool usage.
This guide gives you a clean mental model, current pricing highlights, and real cost examples.


Table of contents


1) How OpenAI API pricing works

Tokens are the unit of billing

Most OpenAI models are priced per 1 million tokens (1M). You typically pay:

  • Input tokens: everything you send (system instructions, messages, tool outputs you feed back, retrieved context).
  • Output tokens: everything the model returns (final answer, structured JSON, tool calls, and some internal reasoning depending on the model family).
  • Cached input tokens (if applicable): certain repeated prompt parts can be billed at a reduced cached-input rate.

A simple way to estimate text cost:

cost_usd =
  (input_tokens / 1_000_000) * input_rate
+ (cached_input_tokens / 1_000_000) * cached_input_rate
+ (output_tokens / 1_000_000) * output_rate

APIs are not priced separately from the model

For the common text endpoints, you are billed at the chosen model’s token rates.
That means using the Responses API, Chat Completions API, or Assistants API does not add a separate “API surcharge”.


2) Processing tiers: Batch vs Flex vs Standard vs Priority

OpenAI exposes multiple processing tiers. The key trade-off is price vs latency and scheduling.

Tier What it is When to use it Typical savings or premium
Standard Default pay-as-you-go pricing Most user-facing apps and normal backend jobs Baseline
Flex Lower cost with higher latency Non-urgent workloads where latency is acceptable Cheaper than Standard
Priority Higher cost for faster processing Latency-sensitive apps and reliability-critical paths More expensive than Standard
Batch Asynchronous over up to 24 hours Large backfills, offline evaluations, nightly indexing, bulk summarization Advertised as 50% off inputs and outputs vs Standard for eligible workloads

A practical rule: if a user is waiting, use Standard or Priority. If a job can wait minutes or hours, Batch usually wins.


3) Text model pricing highlights

Below are common model rates (USD per 1M tokens). This is a curated subset so the table stays readable.
Always confirm the latest full list on the official pricing pages.

Flagship GPT-5 family (selected)

Model Tier Input / 1M Cached input / 1M Output / 1M
gpt-5.2 Standard $1.75 $0.175 $14.00
Batch $0.875 $0.0875 $7.00
Priority $3.50 $0.35 $28.00
gpt-5.1 Standard $1.25 $0.125 $10.00
Batch $0.625 $0.0625 $5.00
Priority $2.50 $0.25 $20.00
gpt-5-mini Standard $0.25 $0.025 $2.00
Batch $0.125 $0.0125 $1.00
Priority $0.45 $0.045 $3.60
gpt-5-nano Standard $0.05 $0.005 $0.40
Batch $0.025 $0.0025 $0.20

Popular smaller, older, and specialist families (quick picks)

If you are cost-sensitive and do not need the heaviest reasoning, the “mini” class models often deliver the best price-to-latency balance.
If you need deeper reasoning, expect output-heavy workloads to cost more.


4) Reasoning tokens and why your bill can surprise you

Some model families internally “think” before answering. The key pricing detail is that
reasoning tokens can still consume context and are billed as output tokens, even when they are not shown explicitly in API responses.
If you see an unexpectedly high output-token line item, this is often the reason.

Practical implication: for heavy reasoning tasks, always measure and cap output tokens, and consider smaller or less reasoning-heavy models when possible.


5) Image pricing: vision inputs vs image generation outputs

Vision input (sending images to a text model)

When you send an image to a text model for analysis, the image is converted into tokens and billed.
Different models convert images into tokens differently, so the same image can cost different amounts depending on the model.

Image generation (GPT Image models)

For image generation and editing, there are typically two cost components:
text tokens for prompts and any text outputs, plus image tokens for the generated image.

Image token rates (USD per 1M image tokens)

Model Input Cached input Output
gpt-image-1.5 $8.00 $2.00 $32.00
gpt-image-1 $10.00 $2.50 $40.00
gpt-image-1-mini $2.50 $0.25 $8.00

OpenAI also provides an “approximate cost per square image” hint for image outputs such as low, medium, and high quality.
Treat these as quick estimates, and verify with the official calculator for your chosen size and quality.


6) Audio and Realtime pricing

Audio pricing depends on whether you are using tokenized audio (Realtime) or speech-specific endpoints (TTS and transcription).
For Realtime audio, pricing is listed per 1M audio tokens for input and output.

Realtime audio tokens (USD per 1M tokens)

Model Input Cached input Output
gpt-realtime $32.00 $0.40 $64.00
gpt-realtime-mini $10.00 $0.30 $20.00

For speech-to-text and text-to-speech, the pricing docs also include estimated costs per minute and per character for certain models.
If your product roadmap includes voice, use those per-minute estimates for budgeting and the token tables for measurement.


7) Video pricing (Sora)

Video generation is priced per second and varies by model and output resolution.

Model Resolution examples Price per second
sora-2 720×1280 or 1280×720 $0.10
sora-2-pro 720×1280 or 1280×720 $0.30
sora-2-pro 1024×1792 or 1792×1024 $0.50

8) Fine-tuning pricing

Fine-tuning costs usually have two parts:
training (priced per 1M tokens or per hour for certain methods),
plus inference (priced per 1M tokens for the fine-tuned model when you use it).

Example fine-tuning rates (selected)

Model Training Standard inference input / 1M Standard inference cached input / 1M Standard inference output / 1M
gpt-4.1 (fine-tune) $25.00 / 1M training tokens $3.00 $0.75 $12.00
gpt-4.1-mini (fine-tune) $5.00 / 1M training tokens $0.80 $0.20 $3.20
gpt-4.1-nano (fine-tune) $1.50 / 1M training tokens $0.20 $0.05 $0.80
o4-mini (reinforcement fine-tune) $100.00 / training hour $4.00 $1.00 $16.00

If you are considering fine-tuning mainly for cost, measure first.
Often you can cut cost more reliably by reducing output length, using caching, or moving a workload to Batch.


9) Built-in tools pricing (File Search, Web Search, Code Interpreter)

OpenAI’s platform includes built-in tools that can add separate line items beyond model tokens.
Some tools charge per call, some per storage/day, and some also bill the tokens they feed into a model.

Common tool costs

Tool How you are billed Price
Code Interpreter Per session or per container (depends on configuration) From $0.03 (default size), higher for larger memory containers
File Search storage Per GB per day $0.10 / GB-day (first 1GB free)
File Search tool call Per 1,000 tool calls (Responses API) $2.50 / 1K calls
Web Search tool calls Per 1,000 calls, plus search content tokens in many cases Commonly $10 / 1K calls for the main tool; preview variants vary

Important detail for Web Search: the bill can have two parts:
(1) tool calls, and (2) search content tokens that are retrieved and included in the model prompt.
If your app does frequent search, budget for both.


10) Cost examples you can reuse

Example A: One chat turn on gpt-5.1 (Standard)

Assume:

  • Input tokens: 2,000
  • Output tokens: 600
  • Model: gpt-5.1 (Standard)

Cost estimate:

Input:  2,000 / 1,000,000 * $1.25  = $0.00250
Output:   600 / 1,000,000 * $10.00 = $0.00600
Total:                              $0.00850 per turn

At 100,000 such turns/month, you are around $850/month for model tokens alone.

Example B: The same workload on gpt-5.1 (Batch)

Input:  2,000 / 1,000,000 * $0.625 = $0.00125
Output:   600 / 1,000,000 * $5.00  = $0.00300
Total:                              $0.00425 per turn

This is roughly half the Standard cost, which matches the “50% off” positioning for Batch-eligible work.

Example C: Reusing a long system prompt with cached input

Suppose 10,000 tokens of your prompt are cached and you add 500 new input tokens per request.
Cached input is billed at a lower rate than normal input, so repeated context can get much cheaper.


11) Cost optimization checklist

  • Choose the smallest model that meets your quality bar. Mini models often cut cost and latency together.
  • Control output length: set max output tokens; keep JSON lean; avoid verbose prompts.
  • Use Batch for offline work: backfills, eval runs, bulk summarization, indexing jobs.
  • Use caching for repeated context: system prompts, policy blocks, long instructions, shared docs.
  • Watch tool line items: File Search storage, Web Search calls, Code Interpreter sessions/containers.
  • Measure with real traffic: token usage is workload-dependent; prototypes lie.

12) Budgets, alerts, and what is billed separately

Set budgets and alerts

The OpenAI API platform allows monthly budgets and email notification thresholds so you can cap or monitor spend.
Budget enforcement can have a delay, so treat it as a safety rail, not a guarantee.

ChatGPT subscriptions are separate from API usage

API usage is billed separately from ChatGPT plans. If you are paying for ChatGPT Plus, Business, Enterprise, or Edu,
that does not include API token usage by default.


13) FAQ

Is the Responses API more expensive than Chat Completions?

No. The endpoint itself is not priced separately. You pay the chosen model’s input and output token rates,
plus any tool costs you use.

Why did my output-token bill spike on a reasoning model?

Some models use internal reasoning tokens that still count as output tokens for billing, even if they are not shown.
Cap output tokens and measure with tracing.

Is Batch always cheaper?

Batch is designed for non-urgent work and is positioned as a large discount on input and output.
If a user is waiting, Batch is usually the wrong choice even if it is cheaper.

How do I estimate image costs?

Use the official pricing calculator for your target size and quality. Images are tokenized, and image generation
models also have separate image token rates.

Do built-in tools bill tokens too?

Often yes. Many tools bill their own per-call or per-storage costs, and also bill tokens when tool outputs are fed into a model.


14) References

  1. OpenAI API Pricing (official) (Accessed: January 2026)
  2. OpenAI Platform Docs: Pricing (official) (Accessed: January 2026)

Author update

Pricing changes quickly. I will keep this post updated with new rates and break-even examples. If you want a custom scenario modeled, share your volumes and constraints.

Leave a Reply

Your email address will not be published. Required fields are marked *