How is API cost calculated?

Most text and embedding APIs charge per token, quoted per million tokens. Cost = (input tokens / 1,000,000 × input price) + (output tokens / 1,000,000 × output price). Image models charge per image or per image-token, and video models charge per second of output.

What are cached input tokens?

Providers like OpenAI and Anthropic offer prompt caching: repeated context (a long system prompt or document) is billed at a fraction of the normal input price on cache hits — often around 10%. Enter your cached token count to see the discount applied.

Are these prices current?

Prices were gathered from official provider pages in July 2026. AI API pricing changes frequently, so always confirm on the provider’s live pricing page (linked below) before making decisions.

Why do output tokens cost more than input?

Generating tokens is more compute-intensive than reading them, so most providers charge 3–5× more for output than input. This is why verbose responses cost disproportionately more.

How many tokens is my text?

A rough rule of thumb is 1 token ≈ 4 characters ≈ 0.75 words in English. A 500-word answer is roughly 650–700 tokens. Use each provider’s tokenizer for exact counts.

AI API Cost Calculator

Estimate and compare API costs for popular AI models across providers — text, image, video, and embeddings — by input/output token pricing.

Show costs in

Prices gathered July 2026 and are per 1M tokens unless noted. API pricing changes often — always confirm on the provider’s official page before relying on a figure.

ModelInput tokens / requestOutput tokens / requestCached input tokens (optional)Number of requests

Cost / request

$0.0100

Total (1,000 requests)

$10.00

Selected model

GPT-5.4

All text models — cost for your inputs

· cheapest first

Model	Provider	Cost / request	Total × 1,000
Mistral Small	Mistral	$0.000250	$0.2500
DeepSeek V4 Flash (chat/reasoner)	DeepSeek	$0.000280	$0.2800
GPT-4.1 nano	OpenAI	$0.000300	$0.3000
Gemini 2.5 Flash-Lite	Google (Gemini)	$0.000300	$0.3000
GPT-4o mini	OpenAI	$0.000450	$0.4500
Grok 4 Fast	xAI (Grok)	$0.000450	$0.4500
GPT-5.4 nano	OpenAI	$0.000825	$0.8250
GPT-4.1 mini	OpenAI	$0.001200	$1.20
Gemini 2.5 Flash	Google (Gemini)	$0.001550	$1.55
Gemini 3 Flash	Google (Gemini)	$0.002000	$2.00
GPT-5.4 mini	OpenAI	$0.003000	$3.00
Claude Haiku 4.5	Anthropic (Claude)	$0.003500	$3.50
Mistral Large	Mistral	$0.005000	$5.00
GPT-4.1	OpenAI	$0.006000	$6.00
Gemini 3.5 Flash	Google (Gemini)	$0.006000	$6.00
Gemini 2.5 ProHigher rate ($2.50/$15) above 200k input tokens.	Google (Gemini)	$0.006250	$6.25
Claude Sonnet 5Introductory pricing through Aug 31, 2026 ($2/$10); then $3/$15.	Anthropic (Claude)	$0.007000	$7.00
GPT-4o	OpenAI	$0.007500	$7.50
Cohere Command A	Cohere	$0.007500	$7.50
Gemini 3.1 ProHigher rate ($4/$18) above 200k input tokens.	Google (Gemini)	$0.008000	$8.00
GPT-5.4	OpenAI	$0.0100	$10.00
Claude Sonnet 4.6	Anthropic (Claude)	$0.0105	$10.50
Grok 4	xAI (Grok)	$0.0105	$10.50
Claude Opus 4.8	Anthropic (Claude)	$0.0175	$17.50
GPT-5.5	OpenAI	$0.0200	$20.00
Claude Fable 5	Anthropic (Claude)	$0.0350	$35.00
Claude Opus 4.1Deprecated.	Anthropic (Claude)	$0.0525	$52.50
GPT-5.5 Pro	OpenAI	$0.1200	$120.00

Official pricing pages

OpenAI Anthropic (Claude) Google (Gemini) DeepSeek Mistral xAI (Grok) Cohere Black Forest Labs (FLUX) Stability AI Ideogram Recraft Runway Kling (Kuaishou) MiniMax (Hailuo) Alibaba (Wan)

How AI API Pricing Works: Tokens, Input vs Output, and Estimating Real Costs

Almost every large language model API is billed by the token rather than by the request. A token is a chunk of text — roughly four characters or about three-quarters of an English word — and both the text you send (the input, or prompt) and the text the model generates (the output, or completion) are counted. Providers quote prices per million tokens, which makes headline numbers look small until you multiply them by the volume of a real application. This calculator turns those per-million rates into the numbers that actually matter: cost per request, and cost across thousands or millions of requests.

The single most important thing to understand is that input and output tokens are priced differently. Generating text is far more compute-intensive than reading it, so output almost always costs several times more than input — commonly three to five times. A model advertised at a cheap input rate can still be expensive if your workload produces long, verbose answers. When you compare models in the table above, watch the output column closely, because for chat and generation workloads it usually dominates the bill.

The Token Cost Formula

The math for a single text request is straightforward once prices are normalised to a per-token basis:

cost = (input ÷ 1,000,000 × input_price) + (output ÷ 1,000,000 × output_price)

For a batch job you simply multiply that per-request cost by the number of requests. Embedding models charge on input tokens only, image models charge per generated image (or per image token), and video models charge per second of output — this tool applies the right formula automatically for each modality.

Prompt Caching Can Cut Costs Dramatically

Several providers, including OpenAI and Anthropic, offer prompt caching. When you repeatedly send the same large context — a long system prompt, a knowledge-base document, or a running conversation history — the cached portion is billed at a steep discount on cache hits, often around 10% of the normal input price.

For agents and chatbots that resend a big fixed prompt on every turn, caching can reduce input cost by the better part of an order of magnitude. Enter your cached token count in the text calculator to see the discount reflected in the estimate.

Choosing a Model: Balancing Cost, Quality, and Latency

The cheapest model is rarely the right default, and the most expensive one is rarely necessary. A practical approach is to route by task difficulty: use a small, low-cost model (such as a mini or nano tier, Gemini Flash-Lite, or DeepSeek) for classification, extraction, and simple summarisation, and reserve a flagship model (GPT-5.5, Claude Opus, Gemini Pro) for complex reasoning, long-form generation, or high-stakes output. Because the price gap between tiers is often 10× to 50×, moving routine traffic to a cheaper model is usually the biggest single lever on an AI bill.

Estimate before you build, not after. Plug your realistic average token counts and monthly request volume into the calculator, compare the top few candidates in the table, and you will often find that two models with similar quality differ enormously in cost. For high-volume or non-urgent workloads, also check whether a provider offers a batch tier (frequently around a 50% discount) — those rates are not shown here but are listed on each provider's official pricing page linked above.

Disclaimer: This AI API Cost Calculator is provided for general estimation and educational purposes. Prices were compiled from official provider pages in July 2026and can change at any time; regional endpoints, data-residency options, batch and priority tiers, fine-tuning, and server-side tools (such as web search) may carry different or additional charges. INR figures are approximate conversions from USD, which is the billing currency for these APIs. Always confirm current pricing on the provider's official page before making purchasing or architectural decisions.

How to Use

Pick a modality: text/chat, image, video, or embeddings.

Choose a model, then enter your token counts (or images/seconds) and number of requests.

See cost per request and total, plus a comparison table of every model sorted cheapest-first.

Cross-check the number on the provider’s official pricing page before you rely on it.

Features

Text, image, video, and embedding pricing in one place

Compare every model side by side, sorted by cost

Handles input, output, and cached (prompt-cache) token pricing

Covers OpenAI, Claude, Gemini, DeepSeek, Mistral, Grok, and Cohere

100% client-side — nothing is sent to a server

FAQ

Use this free AI API cost calculator to estimate and compare pricing for large language models and other AI APIs. Enter your input and output token counts to see the cost per request and monthly total for OpenAI GPT, Anthropic Claude, Google Gemini, DeepSeek, Mistral, xAI Grok, and Cohere. It also covers image generation cost per image, video generation cost per second, and embedding cost per token — all calculated in your browser with no signup.

About AI API Cost Calculator

Estimate and compare the cost of AI API calls across every popular provider and model. Enter input and output token counts to see the cost per request and monthly total for OpenAI GPT, Anthropic Claude, Google Gemini, DeepSeek, Mistral, xAI Grok, and Cohere. Includes prompt-cache pricing, image generation cost per image, video generation cost per second, and embedding cost per token, with a comparison table sorted cheapest-first. All calculations run in your browser.

Processing Note

AI API Cost Calculator runs in your browser, so the input you enter is processed locally on this page and is not uploaded to a ToolMintX account.

Tool Limits

IT tools provide quick diagnostics and transformations. They cannot see every private network, deployment setting, proxy, firewall, or production edge case.

Related Tools

AI VRAM Calculator

Estimate GPU VRAM for LLM inference and training using model, quantization, users, and context length.

Client-side

API Key and .env Secret Generator

Generate secure .env secrets plus selectable Hugging Face, OpenAI, JWT, database, and webhook variables.

Client-side

Subnet Calculator

Free IP Subnet Calculator to instantly calculate network subnets, CIDR, broadcast addresses, and IP ranges online.

Client-side

IPv4 to IPv6 Converter

Instantly convert IPv4 addresses to IPv6 mapped and transition formats online for free.

Client-side

AI API Cost Calculator

All text models — cost for your inputs

Official pricing pages

How AI API Pricing Works: Tokens, Input vs Output, and Estimating Real Costs

The Token Cost Formula

Prompt Caching Can Cut Costs Dramatically

Choosing a Model: Balancing Cost, Quality, and Latency

How to Use

Features

FAQ

About AI API Cost Calculator

Processing Note

Tool Limits

Related Tools

AI VRAM Calculator

API Key and .env Secret Generator

Subnet Calculator

IPv4 to IPv6 Converter

Related Guides