Quick answer: Anthropic API pricing in 2026 is per million tokens (MTok), billed separately for input and output. Claude Opus 4.6 costs $5.00/$25.00 per MTok. Claude Sonnet 4.6 costs $3.00/$15.00. Claude Haiku 4.5 costs $1.00/$5.00. Batch processing is 50% cheaper across all models. Prompt caching cuts cached input cost by 90%. Opus 4.6 and Sonnet 4.6 support 1M token context at flat rates with no surcharge.
Prices verified April 12, 2026 from official Anthropic documentation.
AI API spend has become one of the fastest-growing and least-governed line items in engineering budgets. Anthropic's Claude powers chatbots, coding assistants, agentic workflows, and data-intensive pipelines across industries — and because pricing is based on tokens, usage can escalate quickly without the right controls in place.
Understanding how Anthropic charges — and where the levers are — is essential for FinOps practitioners, engineering leaders, and product teams managing AI at scale. This guide covers every current model, every pricing mechanism, and the practical optimizations that make the biggest difference in production.
Anthropic charges per million tokens (MTok) with separate rates for input tokens (what you send) and output tokens (what the model returns). Output tokens are consistently more expensive, reflecting the additional compute required to generate responses.
| Model | Input ($/MTok) | Output ($/MTok) | Batch Input | Batch Output | Context | Generation |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 Flagship | $5.00 | $25.00 | $2.50 | $12.50 | 1M tokens | Claude 4.6 |
| Claude Sonnet 4.6 Balanced | $3.00 | $15.00 | $1.50 | $7.50 | 1M tokens | Claude 4.6 |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $1.50 | $7.50 | 1M tokens surcharge above 200K | Claude 4.5 |
| Claude Haiku 4.5 Budget | $1.00 | $5.00 | $0.50 | $2.50 | 200K tokens | Claude 4.5 |
| Claude Haiku 3.5 Legacy | $0.80 | $4.00 | $0.40 | $2.00 | 200K tokens | Claude 3.5 |
| Claude Sonnet 3.7 Legacy | $3.00 | $15.00 | $1.50 | $7.50 | 200K tokens | Claude 3.7 |
| Claude Opus 3 Legacy | $15.00 | $75.00 | — | — | 200K tokens | Claude 3 |
All prices per million tokens (MTok). Batch pricing requires the Message Batches API. Legacy models remain available but Anthropic recommends migrating to Claude 4.x.
Claude Opus 3 is still available but costs 3× more than Opus 4.6 ($15.00 vs $5.00 per MTok input). If you are still using it for any production workload, migrating to Opus 4.6 is the single highest-ROI change you can make to your Anthropic bill today.
|
Model |
Input ($/1M tokens) |
Output ($/1M tokens) |
|
Claude 3.5 / 3.7 Haiku |
$0.25 |
$1.25 |
|
Claude 3.5 / 3.7 Sonnet |
$3 |
$15 |
|
Claude 3 Sonnet |
$3 |
$15 |
|
Claude 3 Opus |
$15 |
$75 |
|
Claude 2.0 / 2.1 |
$8 |
$24 |
Claude 3.7 Sonnet introduced in 2025 blends fast responses with deeper reasoning, but keeps the $3/$15 rate. Claude 4 and 4.1 (Opus and Sonnet variants) launched later in 2025 with enhanced coding and reasoning, continuing the same pricing tiers.
API ID: claude-opus-4-6 | Price: $5.00/$25.00 per MTok | Context: 1M tokens, no surcharge
Anthropic's most capable broadly available model, with exceptional performance in coding, complex reasoning, and agentic workflows. Supports extended thinking and adaptive thinking. Max output: 128K tokens on the synchronous API, up to 300K on the Batch API with the beta header.
Best for: Complex multi-step reasoning, agentic pipelines, nuanced writing, and tasks where output quality directly impacts revenue. Reserve for workloads where Sonnet 4.6 genuinely falls short — Opus costs 67% more on input.
API ID: claude-sonnet-4-6 | Price: $3.00/$15.00 per MTok | Context: 1M tokens, no surcharge
The recommended default for most production use cases. Delivers near-Opus quality at faster latency and significantly lower cost. Supports extended and adaptive thinking. Max output: 64K tokens.
Best for: The majority of production workloads — coding, analysis, writing, customer-facing applications, RAG pipelines. Start here and only upgrade to Opus if quality testing shows a meaningful gap.
API ID: claude-haiku-4-5-20251001 | Price: $1.00/$5.00 per MTok | Context: 200K tokens
Near-frontier intelligence at the lowest price in the current generation. The fastest model in the Claude 4.x family. Supports extended thinking. Max output: 64K tokens.
Best for: High-volume, latency-sensitive, or cost-constrained workloads — classification, routing, extraction, summarization, and moderation. At $0.10/MTok on cache hits, extremely cost-effective for RAG applications with reused context.
Prompt caching is Anthropic's most impactful pricing feature. It lets you store frequently reused content — system prompts, documents, examples, tool definitions — so subsequent API calls can read from cache instead of reprocessing the full input. Cache hits cost 90% less than standard input tokens.
Add cache_control: { type: "ephemeral" } to the content blocks you want cached — system prompt, documents, tool definitions. On the first request, Anthropic processes and stores those blocks. The write costs 1.25× standard input for a 5-minute TTL, or 2.0× for a 1-hour TTL. Any subsequent request within the TTL window that includes the same content pays only 0.10× standard input — a 90% discount. Accessing a cached block resets its TTL, so high-frequency applications rarely pay write costs after the initial warm-up.
| Model | Standard Input | 5-min Cache Write | 1-hr Cache Write | Cache Hit (Read) | Hit Discount |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $6.25 (1.25×) | $10.00 (2.0×) | $0.50 | −90% |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $6.00 | $0.30 | −90% |
| Claude Sonnet 4.5 | $3.00 | $3.75 | $6.00 | $0.30 | −90% |
| Claude Haiku 4.5 | $1.00 | $1.25 | $2.00 | $0.10 | −90% |
| Claude Haiku 3.5 | $0.80 | $1.00 | $1.60 | $0.08 | −90% |
Scenario: A RAG app with a 50K-token knowledge base in the system prompt, queried 1,000 times per day. On Sonnet 4.6:
Key insight: Prompt caching is not just an optimization — for any application with a large, reused system prompt or document context, it is the single most impactful change you can make to your Anthropic bill. Applications that query the same knowledge base repeatedly should always use caching.
The Anthropic Message Batches API processes requests asynchronously and returns results within 24 hours, at exactly 50% off standard token prices. There is no quality difference between batch and real-time responses — only timing.
| Model | Standard Input | Batch Input | Standard Output | Batch Output | Saving |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 | 50% |
| Claude Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 | 50% |
| Claude Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 | 50% |
Best workloads for batch processing: document processing pipelines, data enrichment at scale, nightly analytics jobs, offline evaluations, content generation queues, and any task where a few hours of latency is acceptable. A team processing 500K documents per month could save $750–$2,250/month simply by switching to batch.
Note on batch output limits: On the Message Batches API, Claude Opus 4.6 and Sonnet 4.6 support up to 300K output tokens per request using the output-300k-2026-03-24 beta header — significantly more than the synchronous 128K/64K limits. This makes batch ideal for long-form generation workloads.
Not all Claude models handle large context windows at flat rates. Understanding the surcharge rules before choosing a model for long-context workloads can prevent significant unexpected costs.
| Model | Context Window | Surcharge Threshold | Surcharge |
|---|---|---|---|
| Claude Opus 4.6 | 1M tokens | None | Flat rate throughout |
| Claude Sonnet 4.6 | 1M tokens | None | Flat rate throughout |
| Claude Sonnet 4.5 | 1M tokens (beta) | 200K tokens | 2× input, 1.5× output above 200K (entire session) |
| Claude Haiku 4.5 | 200K tokens | N/A | No surcharge (200K max) |
Sonnet 4.5 long-context warning: If you are using Claude Sonnet 4.5 with prompts exceeding 200K tokens via the 1M-token context beta, the entire session is billed at 2× input and 1.5× output. A 300K-token prompt on Sonnet 4.5 does not cost the same as on Sonnet 4.6. Migrate to Sonnet 4.6 for large-context work — the pricing is the same at standard rates, but without the surcharge risk.
Fast Mode is a research-preview feature on Claude Opus 4.6 only. It delivers significantly faster output at a 6× price premium:
| Mode | Input | Output | vs Standard |
|---|---|---|---|
| Standard (Opus 4.6) | $5.00 | $25.00 | — |
| Fast Mode (Opus 4.6) | $30.00 | $150.00 | 6× premium |
Never use Fast Mode as a default. At $30/$150 per MTok, a single 1M-token context query costs $30 in input alone. Reserve Fast Mode only for latency-critical, genuinely time-sensitive scenarios where the speed premium is justified by a concrete business outcome. Always benchmark whether standard Sonnet 4.6 (which is faster than standard Opus 4.6) meets your latency requirements first.
Anthropic offers subscription plans for individuals and teams alongside the API. These are separate products — subscriptions provide access to claude.ai and desktop apps, not the API. All API usage is always billed per token.
| Plan | Price | For | API Access? |
|---|---|---|---|
| Free | $0 | Casual users, trials | No |
| Pro | $20/mo ($17/mo annual) | Individual productivity (claude.ai) | No |
| Max | From $100/mo | Power users — 5× or 20× Pro usage | No |
| Team (Standard seat) | $25/seat/mo ($20 annual) | Teams up to 150 people | No |
| Team (Premium seat) | $125/seat/mo ($100 annual) | Power users on team — 5× usage | No |
| Enterprise | $20/seat + API rates | Large orgs with compliance needs | Yes (billed per token) |
| API (direct) | Per token | Developers building on Claude | Yes |
Key distinction: If you are building a product or automation that calls Claude programmatically, you are using the API and paying per token — regardless of whether you also have a Pro or Max subscription. Subscription plans do not provide API credits or reduce API costs.
Anthropic enforces two complementary sets of limits:
Rate limits vary by model — Haiku has higher default limits than Opus due to lower cost per request. Teams operating at scale should request limit increases proactively, before reaching capacity during peak periods. Being throttled in production costs more in engineering time and SLA impact than the deposit required to raise a tier.
Additional hidden costs to plan for: Web search tool calls via the Anthropic API cost $10 per 1,000 searches, on top of token costs. US-only data residency via the inference_geo parameter adds a 10% premium on all Opus 4.6 and above token costs. These are not reflected in base model pricing.
~2,000 input tokens, ~500 output tokens per conversation. System prompt (5K tokens) cached with 1-hour TTL.
| Model | Daily Cost | Monthly Cost | Notes |
|---|---|---|---|
| Sonnet 4.6 (cache hits) | ~$18 | ~$540 | Cached system prompt, rest standard |
| Haiku 4.5 (no cache) | ~$13 | ~$390 | Cheapest model wins outright for volume |
| Opus 4.6 (no cache) | ~$115 | ~$3,450 | Unnecessary for support — 9× Sonnet cost |
Async batch, no caching. Documents processed overnight.
| Model + Mode | Per-doc Cost | Monthly Cost | Notes |
|---|---|---|---|
| Haiku 4.5 Batch | ~$0.0015 | ~$150 | Best value for bulk processing |
| Sonnet 4.6 Batch | ~$0.0090 | ~$900 | Use if quality requires it |
| Opus 4.6 Batch | ~$0.019 | ~$1,875 | Rarely justified for batch extraction |
Complex multi-step reasoning, code generation. Real-time. No meaningful caching (each session is unique).
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Opus 4.6 | ~$75 | ~$2,250 |
| Sonnet 4.6 | ~$45 | ~$1,350 |
| Haiku 4.5 | ~$15 | ~$450 |
For agentic coding, routing by task complexity delivers the best outcome: use Haiku for simple completions and quick lookups, Sonnet for most code tasks, and Opus only for the most complex architectural reasoning. A well-implemented router can bring blended costs close to Haiku rates while maintaining Opus quality where it counts.
Anthropic API pricing is per million tokens. Current rates: Claude Opus 4.6 at $5.00/$25.00 per MTok, Claude Sonnet 4.6 at $3.00/$15.00 per MTok, Claude Haiku 4.5 at $1.00/$5.00 per MTok. Batch processing halves all token costs. Prompt caching reduces cached input by 90%.
Claude Opus 4.6 costs $5.00 per million input tokens and $25.00 per million output tokens at standard rates. Batch: $2.50/$12.50. Cache hit: $0.50/MTok input (90% off). Supports 1M token context at flat rates with no surcharge. Fast Mode (research preview) costs $30.00/$150.00 per MTok — a 6× premium.
Claude Haiku 4.5 at $1.00/$5.00 per MTok is the cheapest current-generation model. With batch processing it drops to $0.50/$2.50. The legacy Haiku 3.5 is available at $0.80/$4.00, but Haiku 4.5 is recommended — it is faster, more capable, and only marginally more expensive.
You mark content blocks with cache_control: { type: "ephemeral" } in your API request. The first request writes the cache at 1.25× input cost (5-min TTL) or 2.0× input cost (1-hour TTL). Subsequent requests that hit the cache pay only 0.10× input — a 90% discount. Accessing a cached block resets its TTL.
Claude Opus 4.6 and Sonnet 4.6 both support 1M token contexts at completely flat rates — no surcharge. Claude Sonnet 4.5 on the 1M-token beta applies a 2× input / 1.5× output surcharge above 200K tokens. Claude Haiku 4.5 has a 200K context window with no surcharge.
No. Free access is only available via the claude.ai web and mobile interface. All API usage is billed per token regardless of whether you have a consumer subscription. There is no free tier for direct API access.
At standard rates, OpenAI is generally cheaper — GPT-5.4 at $2.50/$15.00 vs Claude Opus 4.6 at $5.00/$25.00. However, both providers now offer ~90% caching discounts, making effective costs competitive for cache-heavy workloads. Anthropic leads on long-context flat-rate pricing (Opus 4.6 and Sonnet 4.6 at 1M tokens, no surcharge) and on complex reasoning quality. See our full OpenAI vs Anthropic pricing comparison for a complete breakdown.
Use Anthropic's built-in usage API to pull token and cost data. For teams operating at scale across multiple providers, a FinOps platform like Finout provides unified visibility, cost allocation by team or feature, and real-time anomaly detection across your full AI and cloud spend — without requiring custom instrumentation.
Anthropic's API pricing in 2026 is meaningfully more nuanced than it was a year ago. The three-tier model lineup (Haiku → Sonnet → Opus) is stable, but the cost levers — prompt caching, batch processing, long-context surcharge avoidance, and model selection — create a wide range of effective costs for the same workload depending on how well you optimize.
The highest-impact changes for most teams, in order of ROI: