Anthropic API Pricing in 2026: Complete Guide — Models, Caching, Batch & Optimization
Anthropic API Pricing at a Glance (2026)
Quick answer: Anthropic API pricing in 2026 is per million tokens (MTok), billed separately for input and output. Claude Opus 4.6 costs $5.00/$25.00 per MTok. Claude Sonnet 4.6 costs $3.00/$15.00. Claude Haiku 4.5 costs $1.00/$5.00. Batch processing is 50% cheaper across all models. Prompt caching cuts cached input cost by 90%. Opus 4.6 and Sonnet 4.6 support 1M token context at flat rates with no surcharge.
Prices verified April 12, 2026 from official Anthropic documentation.
Why Anthropic API Pricing Matters in 2026
AI API spend has become one of the fastest-growing and least-governed line items in engineering budgets. Anthropic's Claude powers chatbots, coding assistants, agentic workflows, and data-intensive pipelines across industries — and because pricing is based on tokens, usage can escalate quickly without the right controls in place.
Understanding how Anthropic charges — and where the levers are — is essential for FinOps practitioners, engineering leaders, and product teams managing AI at scale. This guide covers every current model, every pricing mechanism, and the practical optimizations that make the biggest difference in production.
Anthropic API Pricing by Model
Anthropic charges per million tokens (MTok) with separate rates for input tokens (what you send) and output tokens (what the model returns). Output tokens are consistently more expensive, reflecting the additional compute required to generate responses.
| Model | Input ($/MTok) | Output ($/MTok) | Batch Input | Batch Output | Context | Generation |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 Flagship | $5.00 | $25.00 | $2.50 | $12.50 | 1M tokens | Claude 4.6 |
| Claude Sonnet 4.6 Balanced | $3.00 | $15.00 | $1.50 | $7.50 | 1M tokens | Claude 4.6 |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $1.50 | $7.50 | 1M tokens surcharge above 200K | Claude 4.5 |
| Claude Haiku 4.5 Budget | $1.00 | $5.00 | $0.50 | $2.50 | 200K tokens | Claude 4.5 |
| Claude Haiku 3.5 Legacy | $0.80 | $4.00 | $0.40 | $2.00 | 200K tokens | Claude 3.5 |
| Claude Sonnet 3.7 Legacy | $3.00 | $15.00 | $1.50 | $7.50 | 200K tokens | Claude 3.7 |
| Claude Opus 3 Legacy | $15.00 | $75.00 | — | — | 200K tokens | Claude 3 |
All prices per million tokens (MTok). Batch pricing requires the Message Batches API. Legacy models remain available but Anthropic recommends migrating to Claude 4.x.
Claude Opus 3 is still available but costs 3× more than Opus 4.6 ($15.00 vs $5.00 per MTok input). If you are still using it for any production workload, migrating to Opus 4.6 is the single highest-ROI change you can make to your Anthropic bill today.
|
Model |
Input ($/1M tokens) |
Output ($/1M tokens) |
|
Claude 3.5 / 3.7 Haiku |
$0.25 |
$1.25 |
|
Claude 3.5 / 3.7 Sonnet |
$3 |
$15 |
|
Claude 3 Sonnet |
$3 |
$15 |
|
Claude 3 Opus |
$15 |
$75 |
|
Claude 2.0 / 2.1 |
$8 |
$24 |
Claude 3.7 Sonnet introduced in 2025 blends fast responses with deeper reasoning, but keeps the $3/$15 rate. Claude 4 and 4.1 (Opus and Sonnet variants) launched later in 2025 with enhanced coding and reasoning, continuing the same pricing tiers.
Model Profiles: Capabilities & When to Use Each
Claude Opus 4.6 — Flagship
API ID: claude-opus-4-6 | Price: $5.00/$25.00 per MTok | Context: 1M tokens, no surcharge
Anthropic's most capable broadly available model, with exceptional performance in coding, complex reasoning, and agentic workflows. Supports extended thinking and adaptive thinking. Max output: 128K tokens on the synchronous API, up to 300K on the Batch API with the beta header.
Best for: Complex multi-step reasoning, agentic pipelines, nuanced writing, and tasks where output quality directly impacts revenue. Reserve for workloads where Sonnet 4.6 genuinely falls short — Opus costs 67% more on input.
Claude Sonnet 4.6 — Best Balance
API ID: claude-sonnet-4-6 | Price: $3.00/$15.00 per MTok | Context: 1M tokens, no surcharge
The recommended default for most production use cases. Delivers near-Opus quality at faster latency and significantly lower cost. Supports extended and adaptive thinking. Max output: 64K tokens.
Best for: The majority of production workloads — coding, analysis, writing, customer-facing applications, RAG pipelines. Start here and only upgrade to Opus if quality testing shows a meaningful gap.
Claude Haiku 4.5 — Budget
API ID: claude-haiku-4-5-20251001 | Price: $1.00/$5.00 per MTok | Context: 200K tokens
Near-frontier intelligence at the lowest price in the current generation. The fastest model in the Claude 4.x family. Supports extended thinking. Max output: 64K tokens.
Best for: High-volume, latency-sensitive, or cost-constrained workloads — classification, routing, extraction, summarization, and moderation. At $0.10/MTok on cache hits, extremely cost-effective for RAG applications with reused context.
Prompt Caching: The Biggest Cost Lever Available
Prompt caching is Anthropic's most impactful pricing feature. It lets you store frequently reused content — system prompts, documents, examples, tool definitions — so subsequent API calls can read from cache instead of reprocessing the full input. Cache hits cost 90% less than standard input tokens.
How Prompt Caching Works
Add cache_control: { type: "ephemeral" } to the content blocks you want cached — system prompt, documents, tool definitions. On the first request, Anthropic processes and stores those blocks. The write costs 1.25× standard input for a 5-minute TTL, or 2.0× for a 1-hour TTL. Any subsequent request within the TTL window that includes the same content pays only 0.10× standard input — a 90% discount. Accessing a cached block resets its TTL, so high-frequency applications rarely pay write costs after the initial warm-up.
Prompt Caching Pricing by Model
| Model | Standard Input | 5-min Cache Write | 1-hr Cache Write | Cache Hit (Read) | Hit Discount |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $6.25 (1.25×) | $10.00 (2.0×) | $0.50 | −90% |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $6.00 | $0.30 | −90% |
| Claude Sonnet 4.5 | $3.00 | $3.75 | $6.00 | $0.30 | −90% |
| Claude Haiku 4.5 | $1.00 | $1.25 | $2.00 | $0.10 | −90% |
| Claude Haiku 3.5 | $0.80 | $1.00 | $1.60 | $0.08 | −90% |
Caching Cost Example: RAG Application
Scenario: A RAG app with a 50K-token knowledge base in the system prompt, queried 1,000 times per day. On Sonnet 4.6:
Key insight: Prompt caching is not just an optimization — for any application with a large, reused system prompt or document context, it is the single most impactful change you can make to your Anthropic bill. Applications that query the same knowledge base repeatedly should always use caching.
Batch Processing: 50% Off for Async Workloads
The Anthropic Message Batches API processes requests asynchronously and returns results within 24 hours, at exactly 50% off standard token prices. There is no quality difference between batch and real-time responses — only timing.
| Model | Standard Input | Batch Input | Standard Output | Batch Output | Saving |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 | 50% |
| Claude Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 | 50% |
| Claude Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 | 50% |
Best workloads for batch processing: document processing pipelines, data enrichment at scale, nightly analytics jobs, offline evaluations, content generation queues, and any task where a few hours of latency is acceptable. A team processing 500K documents per month could save $750–$2,250/month simply by switching to batch.
Note on batch output limits: On the Message Batches API, Claude Opus 4.6 and Sonnet 4.6 support up to 300K output tokens per request using the output-300k-2026-03-24 beta header — significantly more than the synchronous 128K/64K limits. This makes batch ideal for long-form generation workloads.
Long-Context Pricing: Which Models Have Surcharges
Not all Claude models handle large context windows at flat rates. Understanding the surcharge rules before choosing a model for long-context workloads can prevent significant unexpected costs.
| Model | Context Window | Surcharge Threshold | Surcharge |
|---|---|---|---|
| Claude Opus 4.6 | 1M tokens | None | Flat rate throughout |
| Claude Sonnet 4.6 | 1M tokens | None | Flat rate throughout |
| Claude Sonnet 4.5 | 1M tokens (beta) | 200K tokens | 2× input, 1.5× output above 200K (entire session) |
| Claude Haiku 4.5 | 200K tokens | N/A | No surcharge (200K max) |
Sonnet 4.5 long-context warning: If you are using Claude Sonnet 4.5 with prompts exceeding 200K tokens via the 1M-token context beta, the entire session is billed at 2× input and 1.5× output. A 300K-token prompt on Sonnet 4.5 does not cost the same as on Sonnet 4.6. Migrate to Sonnet 4.6 for large-context work — the pricing is the same at standard rates, but without the surcharge risk.
Fast Mode (Opus 4.6 — Research Preview)
Fast Mode is a research-preview feature on Claude Opus 4.6 only. It delivers significantly faster output at a 6× price premium:
| Mode | Input | Output | vs Standard |
|---|---|---|---|
| Standard (Opus 4.6) | $5.00 | $25.00 | — |
| Fast Mode (Opus 4.6) | $30.00 | $150.00 | 6× premium |
Never use Fast Mode as a default. At $30/$150 per MTok, a single 1M-token context query costs $30 in input alone. Reserve Fast Mode only for latency-critical, genuinely time-sensitive scenarios where the speed premium is justified by a concrete business outcome. Always benchmark whether standard Sonnet 4.6 (which is faster than standard Opus 4.6) meets your latency requirements first.
Consumer & Team Subscriptions vs API Billing
Anthropic offers subscription plans for individuals and teams alongside the API. These are separate products — subscriptions provide access to claude.ai and desktop apps, not the API. All API usage is always billed per token.
| Plan | Price | For | API Access? |
|---|---|---|---|
| Free | $0 | Casual users, trials | No |
| Pro | $20/mo ($17/mo annual) | Individual productivity (claude.ai) | No |
| Max | From $100/mo | Power users — 5× or 20× Pro usage | No |
| Team (Standard seat) | $25/seat/mo ($20 annual) | Teams up to 150 people | No |
| Team (Premium seat) | $125/seat/mo ($100 annual) | Power users on team — 5× usage | No |
| Enterprise | $20/seat + API rates | Large orgs with compliance needs | Yes (billed per token) |
| API (direct) | Per token | Developers building on Claude | Yes |
Key distinction: If you are building a product or automation that calls Claude programmatically, you are using the API and paying per token — regardless of whether you also have a Pro or Max subscription. Subscription plans do not provide API credits or reduce API costs.
Rate Limits & Usage Tiers
Anthropic enforces two complementary sets of limits:
- Rate limits — caps on requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD) per model. New accounts start at lower limits and can request increases as usage grows through the console.
- Usage tiers — monthly spend caps that require pre-authorization or deposits to increase. Moving between tiers unlocks higher rate limits and spend capacity.
Rate limits vary by model — Haiku has higher default limits than Opus due to lower cost per request. Teams operating at scale should request limit increases proactively, before reaching capacity during peak periods. Being throttled in production costs more in engineering time and SLA impact than the deposit required to raise a tier.
Additional hidden costs to plan for: Web search tool calls via the Anthropic API cost $10 per 1,000 searches, on top of token costs. US-only data residency via the inference_geo parameter adds a 10% premium on all Opus 4.6 and above token costs. These are not reflected in base model pricing.
Real-World Cost Scenarios
Scenario 1: High-Volume Customer Support Chatbot (10,000 conversations/day)
~2,000 input tokens, ~500 output tokens per conversation. System prompt (5K tokens) cached with 1-hour TTL.
| Model | Daily Cost | Monthly Cost | Notes |
|---|---|---|---|
| Sonnet 4.6 (cache hits) | ~$18 | ~$540 | Cached system prompt, rest standard |
| Haiku 4.5 (no cache) | ~$13 | ~$390 | Cheapest model wins outright for volume |
| Opus 4.6 (no cache) | ~$115 | ~$3,450 | Unnecessary for support — 9× Sonnet cost |
Scenario 2: Bulk Document Processing (100,000 docs/month, 5K tokens each)
Async batch, no caching. Documents processed overnight.
| Model + Mode | Per-doc Cost | Monthly Cost | Notes |
|---|---|---|---|
| Haiku 4.5 Batch | ~$0.0015 | ~$150 | Best value for bulk processing |
| Sonnet 4.6 Batch | ~$0.0090 | ~$900 | Use if quality requires it |
| Opus 4.6 Batch | ~$0.019 | ~$1,875 | Rarely justified for batch extraction |
Scenario 3: Agentic Coding Assistant (500 sessions/day, ~20K tokens each)
Complex multi-step reasoning, code generation. Real-time. No meaningful caching (each session is unique).
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Opus 4.6 | ~$75 | ~$2,250 |
| Sonnet 4.6 | ~$45 | ~$1,350 |
| Haiku 4.5 | ~$15 | ~$450 |
For agentic coding, routing by task complexity delivers the best outcome: use Haiku for simple completions and quick lookups, Sonnet for most code tasks, and Opus only for the most complex architectural reasoning. A well-implemented router can bring blended costs close to Haiku rates while maintaining Opus quality where it counts.
Best Practices: How to Optimize Anthropic API Spend
- Choose the right model tier for every task- Use Haiku 4.5 for classification, routing, extraction, and high-volume simple queries. Use Sonnet 4.6 for most production tasks — it delivers near-Opus quality at 60% lower input cost. Reserve Opus 4.6 for genuinely complex reasoning, agentic workflows, or when output quality directly impacts revenue.
- Implement prompt caching on any reused content- If your system prompt, knowledge base, or document context exceeds 1K tokens and is shared across many requests, caching is the single highest-ROI optimization available. A 50K-token system prompt cached at $6.00/MTok write and $0.30/MTok hit costs 85–90% less than processing it fresh with every request.
- Route async work through the Batch API- Any workload that can tolerate a few hours of latency — document processing, data enrichment, nightly reports, evaluation runs — should use the Message Batches API. The 50% discount is automatic and applies to every token with no quality trade-off.
- Write lean, structured prompts- Every unnecessary token in your system prompt costs money at scale. Audit prompts for redundancy — verbose instructions, repeated context, examples that aren't improving output quality. Structured formats (numbered steps, clear headers) reduce ambiguity and often shorten outputs, cutting both input and output costs.
- Monitor and allocate spend proactively- Pull token and cost data via the Anthropic usage API. Break down spend by model, by feature, and by team. Anomalies are much cheaper to catch early than to absorb on a monthly bill. FinOps platforms like Finout automate this across your full AI and cloud stack.
- Plan capacity upgrades before you need them- Request higher usage tiers and rate limit increases proactively — before peak periods, not after you hit a wall. Being throttled in production is far more expensive than the deposit required to raise a tier.
- Don't default to Opus for routine workloads- Claude Opus 4.6 costs 5× Sonnet 4.6 on input and output. Using it for support, classification, or summarisation that Sonnet handles equally well inflates your bill by up to 400% with no quality benefit.
- Don't confuse consumer subscriptions with API access- A Pro or Max subscription gives access to claude.ai — not the API. API usage is always billed per token. Teams building on Claude pay for both independently.
- Don't use Fast Mode by default- Opus 4.6 Fast Mode costs $30/$150 per MTok — 6× standard rates. Use it only for genuinely latency-critical flows, and only after confirming standard Sonnet 4.6 (which is inherently faster than standard Opus) doesn't already meet your speed requirements.
- Don't use Sonnet 4.5 for prompts exceeding 200K tokens- The 1M-context beta on Sonnet 4.5 triggers a 2× input / 1.5× output surcharge above 200K tokens for the entire session. Migrate to Sonnet 4.6 — same base price, same capability, no surcharge at any context length up to 1M.
Frequently Asked Questions
What is Anthropic API pricing in 2026?
Anthropic API pricing is per million tokens. Current rates: Claude Opus 4.6 at $5.00/$25.00 per MTok, Claude Sonnet 4.6 at $3.00/$15.00 per MTok, Claude Haiku 4.5 at $1.00/$5.00 per MTok. Batch processing halves all token costs. Prompt caching reduces cached input by 90%.
How much does Claude Opus 4.6 cost per million tokens?
Claude Opus 4.6 costs $5.00 per million input tokens and $25.00 per million output tokens at standard rates. Batch: $2.50/$12.50. Cache hit: $0.50/MTok input (90% off). Supports 1M token context at flat rates with no surcharge. Fast Mode (research preview) costs $30.00/$150.00 per MTok — a 6× premium.
What is the cheapest Claude model available via API?
Claude Haiku 4.5 at $1.00/$5.00 per MTok is the cheapest current-generation model. With batch processing it drops to $0.50/$2.50. The legacy Haiku 3.5 is available at $0.80/$4.00, but Haiku 4.5 is recommended — it is faster, more capable, and only marginally more expensive.
How does Anthropic prompt caching work?
You mark content blocks with cache_control: { type: "ephemeral" } in your API request. The first request writes the cache at 1.25× input cost (5-min TTL) or 2.0× input cost (1-hour TTL). Subsequent requests that hit the cache pay only 0.10× input — a 90% discount. Accessing a cached block resets its TTL.
Does Anthropic charge extra for long context windows?
Claude Opus 4.6 and Sonnet 4.6 both support 1M token contexts at completely flat rates — no surcharge. Claude Sonnet 4.5 on the 1M-token beta applies a 2× input / 1.5× output surcharge above 200K tokens. Claude Haiku 4.5 has a 200K context window with no surcharge.
Is there a free Anthropic API tier?
No. Free access is only available via the claude.ai web and mobile interface. All API usage is billed per token regardless of whether you have a consumer subscription. There is no free tier for direct API access.
How does Anthropic API pricing compare to OpenAI in 2026?
At standard rates, OpenAI is generally cheaper — GPT-5.4 at $2.50/$15.00 vs Claude Opus 4.6 at $5.00/$25.00. However, both providers now offer ~90% caching discounts, making effective costs competitive for cache-heavy workloads. Anthropic leads on long-context flat-rate pricing (Opus 4.6 and Sonnet 4.6 at 1M tokens, no surcharge) and on complex reasoning quality. See our full OpenAI vs Anthropic pricing comparison for a complete breakdown.
What's the best way to monitor and control Anthropic API costs?
Use Anthropic's built-in usage API to pull token and cost data. For teams operating at scale across multiple providers, a FinOps platform like Finout provides unified visibility, cost allocation by team or feature, and real-time anomaly detection across your full AI and cloud spend — without requiring custom instrumentation.
The Bottom Line
Anthropic's API pricing in 2026 is meaningfully more nuanced than it was a year ago. The three-tier model lineup (Haiku → Sonnet → Opus) is stable, but the cost levers — prompt caching, batch processing, long-context surcharge avoidance, and model selection — create a wide range of effective costs for the same workload depending on how well you optimize.
The highest-impact changes for most teams, in order of ROI:
- Implement prompt caching on any reused system prompt or document context — 85–90% reduction on cached input
- Audit model tier usage — downgrade from Opus to Sonnet wherever quality is equal; from Sonnet to Haiku for high-volume simple tasks
- Move async workloads to the Batch API — 50% off with no quality penalty
- Migrate off Sonnet 4.5 for large-context work — avoid the 200K surcharge by moving to Sonnet 4.6
- Monitor and allocate spend — you cannot optimize what you cannot see
One platform. Every team. Complete control.
Built for the complexity, speed, and ownership demands of modern cloud and AI environments

