Quick answer: Anthropic API pricing in 2026 is per million tokens (MTok), billed separately for input and output. Claude Opus 4.8 (launched May 28, 2026) costs $5.00/$25.00 per MTok — same rate as Opus 4.7, with Fast Mode now at $10/$50 per MTok (down from $30/$150 on Opus 4.7). Claude Sonnet 4.6 costs $3.00/$15.00. Claude Haiku 4.5 costs $1.00/$5.00. Batch processing is 50% cheaper across all models. Prompt caching cuts cached input cost by 90%. Opus 4.8, Opus 4.7, and Sonnet 4.6 support 1M token context at flat rates with no surcharge.
Note on Opus 4.7 tokenizer (still relevant for 4.6 → 4.8 migrations): Opus 4.7 introduced a new tokenizer that can generate up to 35% more tokens for the same input text compared to Opus 4.6. Per-token prices are unchanged, but effective cost per request can increase by up to 35% for teams migrating from 4.6. The 4.7 → 4.8 migration carries no additional tokenizer penalty — it is a config-only change.
Prices verified May 28, 2026 from official Anthropic documentation.
Current generation — standard API rates per million tokens
Claude Opus 4.8 (current flagship, released May 28, 2026) — $5.00 input / $25.00 output per million tokens
Claude Sonnet 4.6 — $3.00 input / $15.00 output per million tokens
Claude Haiku 4.5 — $1.00 input / $5.00 output per million tokens
AI API spend has become one of the fastest-growing and least-governed line items in engineering budgets. Anthropic's Claude powers chatbots, coding assistants, agentic workflows, and data-intensive pipelines across industries — and because pricing is based on tokens, usage can escalate quickly without the right controls in place.
Understanding how Anthropic charges — and where the levers are — is essential for FinOps practitioners, engineering leaders, and product teams managing AI at scale. This guide covers every current model, every pricing mechanism, and the practical optimizations that make the biggest difference in production.
Tracking Claude spend across teams? Finout's Anthropic integration pulls token-level costs into your MegaBill alongside AWS, GCP, and Kubernetes — so you can see which team or product is driving spend.
Anthropic charges per million tokens (MTok) with separate rates for input tokens (what you send) and output tokens (what the model returns). Output tokens are consistently more expensive, reflecting the additional compute required to generate responses.
| Model | Input ($/MTok) | Output ($/MTok) | Batch Input | Batch Output | Context | Generation |
|---|---|---|---|---|---|---|
| Claude Opus 4.8 Flagship | $5.00 | $25.00 | $2.50 | $12.50 | 1M tokens | Claude 4.8 |
| Claude Opus 4.7 Previous Gen | $5.00 | $25.00 | $2.50 | $12.50 | 1M tokens | Claude 4.7 |
| Claude Opus 4.6 Legacy | $5.00 | $25.00 | $2.50 | $12.50 | 1M tokens | Claude 4.6 |
| Claude Sonnet 4.6 Balanced | $3.00 | $15.00 | $1.50 | $7.50 | 1M tokens | Claude 4.6 |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $1.50 | $7.50 | 1M tokens (surcharge above 200K) | Claude 4.5 |
| Claude Haiku 4.5 Budget | $1.00 | $5.00 | $0.50 | $2.50 | 200K tokens | Claude 4.5 |
| Claude Haiku 3.5 Legacy | $0.80 | $4.00 | $0.40 | $2.00 | 200K tokens | Claude 3.5 |
| Claude Sonnet 3.7 Legacy | $3.00 | $15.00 | $1.50 | $7.50 | 200K tokens | Claude 3.7 |
| Claude Opus 3 Legacy | $15.00 | $75.00 | — | — | 200K tokens | Claude 3 |
All prices per million tokens (MTok). Batch pricing requires the Message Batches API. Legacy models remain available but Anthropic recommends migrating to Claude 4.x.
Claude Opus 3 is still available but costs 3× more than Opus 4.8 ($15.00 vs $5.00 per MTok input). If you are still using it for any production workload, migrating to Opus 4.8 is the single highest-ROI change you can make to your Anthropic bill today.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Claude 3.5 / 3.7 Haiku | $0.25 | $1.25 |
| Claude 3.5 / 3.7 Sonnet | $3 | $15 |
| Claude 3 Sonnet | $3 | $15 |
| Claude 3 Opus | $15 | $75 |
| Claude 2.0 / 2.1 | $8 | $24 |
Claude 3.7 Sonnet introduced in 2025 blends fast responses with deeper reasoning, but keeps the $3/$15 rate. Claude 4 and 4.1 (Opus and Sonnet variants) launched later in 2025 with enhanced coding and reasoning, continuing the same pricing tiers.
Already using Claude at scale? See how Finout tracks it
API ID: claude-opus-4-8 | Price: $5.00/$25.00 per MTok | Context: 1M tokens | Released: May 28, 2026
Anthropic's most capable model as of May 2026. Significant advances in agentic coding (SWE-bench Pro 69.2%, up from 64.3% on Opus 4.7), long-horizon task coherence, and terminal automation (+8.5 pts on Terminal-Bench). Hybrid reasoning model with adaptive thinking. Max output: 128K tokens synchronous, up to 300K on the Batch API.
Fast Mode is now $10/$50 per MTok — a 3× reduction from the $30/$150 rate on Opus 4.7. This makes speed-optimized Opus pricing viable for latency-sensitive workflows that previously couldn't justify the premium.
Migration from Opus 4.7: No new tokenizer penalty. The 4.7 → 4.8 upgrade is a config-only change — same API surface, same context window, same token pricing. The tokenizer warning below applies only to teams still on Opus 4.6.
Best for: Autonomous coding agents, long-horizon agentic tasks, complex multi-step reasoning, and any workflow where maximum quality directly impacts revenue. For most production workloads (support, RAG, analysis, writing), Sonnet 4.6 remains the better cost/quality choice.
API ID: claude-opus-4-7 | Price: $5.00/$25.00 per MTok | Context: 1M tokens, no surcharge
Anthropic's prior flagship, now superseded by Opus 4.8. Still capable of state-of-the-art performance on most tasks. SWE-bench Pro: 64.3%. Max output: 128K tokens synchronous, up to 300K on the Batch API.
Tokenizer note: Opus 4.7 ships with a tokenizer that generates up to 35% more tokens for the same input text compared to Opus 4.6. Per-token prices are unchanged, but effective cost per request can be higher. Benchmark before migrating from 4.6. The 4.7 → 4.8 migration does not carry an additional tokenizer penalty.
Best for: Teams already on Opus 4.7 who haven't yet migrated to 4.8. Upgrading to 4.8 is recommended — same price, better benchmarks.
API ID: claude-sonnet-4-6 | Price: $3.00/$15.00 per MTok | Context: 1M tokens, no surcharge
The recommended default for most production use cases. Delivers near-Opus quality at faster latency and significantly lower cost. Best for: the majority of production workloads — coding, analysis, writing, customer-facing applications, RAG pipelines.
API ID: claude-haiku-4-5-20251001 | Price: $1.00/$5.00 per MTok | Context: 200K tokens
Near-frontier intelligence at the lowest price in the current generation. Best for: high-volume, latency-sensitive workloads — classification, routing, extraction, summarization, and moderation.
Prompt caching is Anthropic's most impactful pricing feature. It lets you store frequently reused content — system prompts, documents, examples, tool definitions — so subsequent API calls can read from cache instead of reprocessing the full input. Cache hits cost 90% less than standard input tokens.
Add cache_control: { type: "ephemeral" } to the content blocks you want cached — system prompt, documents, tool definitions. On the first request, Anthropic processes and stores those blocks. The write costs 1.25× standard input for a 5-minute TTL, or 2.0× for a 1-hour TTL. Any subsequent request within the TTL window that includes the same content pays only 0.10× standard input — a 90% discount. Accessing a cached block resets its TTL, so high-frequency applications rarely pay write costs after the initial warm-up.
| Model | Standard Input | 5-min Cache Write | 1-hr Cache Write | Cache Hit (Read) | Hit Discount |
|---|---|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $6.25 (1.25×) | $10.00 (2.0×) | $0.50 | −90% |
| Claude Opus 4.7 | $5.00 | $6.25 (1.25×) | $10.00 (2.0×) | $0.50 | −90% |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $6.00 | $0.30 | −90% |
| Claude Sonnet 4.5 | $3.00 | $3.75 | $6.00 | $0.30 | −90% |
| Claude Haiku 4.5 | $1.00 | $1.25 | $2.00 | $0.10 | −90% |
| Claude Haiku 3.5 | $0.80 | $1.00 | $1.60 | $0.08 | −90% |
Scenario: A RAG app with a 50K-token knowledge base in the system prompt, queried 1,000 times per day. On Sonnet 4.6:
Without caching — 1,000 queries/day at 50K tokens each:
With caching — same workload, high cache hit rate:
Key insight: Prompt caching is not just an optimization — for any application with a large, reused system prompt or document context, it is the single most impactful change you can make to your Anthropic bill. Applications that query the same knowledge base repeatedly should always use caching.
The Anthropic Message Batches API processes requests asynchronously and returns results within 24 hours, at exactly 50% off standard token prices. There is no quality difference between batch and real-time responses — only timing.
| Model | Standard Input | Batch Input | Standard Output | Batch Output | Saving |
|---|---|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $2.50 | $25.00 | $12.50 | 50% |
| Claude Opus 4.7 | $5.00 | $2.50 | $25.00 | $12.50 | 50% |
| Claude Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 | 50% |
| Claude Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 | 50% |
Best workloads for batch processing: document processing pipelines, data enrichment at scale, nightly analytics jobs, offline evaluations, content generation queues, and any task where a few hours of latency is acceptable. A team processing 500K documents per month could save $750–$2,250/month simply by switching to batch.
Note on batch output limits: On the Message Batches API, Claude Opus 4.7/4.6 and sonnet 4.6 support up to 300K output tokens per request using the output-300k-2026-03-24 beta header — significantly more than the synchronous 128K/64K limits. This makes batch ideal for long-form generation workloads.
Not all Claude models handle large context windows at flat rates. Understanding the surcharge rules before choosing a model for long-context workloads can prevent significant unexpected costs.
| Model | Context Window | Surcharge Threshold | Surcharge |
|---|---|---|---|
| Claude Opus 4.8 | 1M tokens | None | Flat rate throughout |
| Claude Opus 4.7 | 1M tokens | None | Flat rate throughout |
| Claude Sonnet 4.6 | 1M tokens | None | Flat rate throughout |
| Claude Sonnet 4.5 | 1M tokens (beta) | 200K tokens | 2× input, 1.5× output above 200K (entire session) |
| Claude Haiku 4.5 | 200K tokens | N/A | No surcharge (200K max) |
Sonnet 4.5 long-context warning: If you are using Claude Sonnet 4.5 with prompts exceeding 200K tokens via the 1M-token context beta, the entire session is billed at 2× input and 1.5× output. A 300K-token prompt on Sonnet 4.5 does not cost the same as on Sonnet 4.6. Migrate to Sonnet 4.6 for large-context work — the pricing is the same at standard rates, but without the surcharge risk.
Fast Mode delivers significantly faster output in exchange for a price premium. The pricing changed materially with Opus 4.8.
| Mode | Input | Output | vs Standard |
|---|---|---|---|
| Standard (Opus 4.8) | $5.00 | $25.00 | — |
| Fast Mode (Opus 4.8) | $10.00 | $50.00 | 2× premium |
| Fast Mode (Opus 4.7) | $30.00 | $150.00 | 6× premium |
Fast Mode on Opus 4.8 runs at approximately 2.5× standard output speed and costs $10/$50 per MTok — a 3× reduction from the $30/$150 rate on Opus 4.7. For latency-sensitive agentic workloads that previously couldn't justify the Opus 4.7 fast tier, this materially changes the calculus.
Still: benchmark standard Sonnet 4.6 first. Sonnet 4.6 is inherently faster than standard Opus at 60% lower cost. If your workload's latency requirement is met by Sonnet 4.6, that remains the right choice. Reserve Opus 4.8 Fast Mode for workflows where maximum model quality and low latency are both required — user-facing coding assistants, live incident triage, real-time agentic loops.
Never use Fast Mode as a default. At $10/$50 per MTok, a 1M-token context input query alone costs $10. Reserve Fast Mode for genuinely latency-critical scenarios where the speed premium is justified by a concrete business outcome.
Anthropic offers subscription plans for individuals and teams alongside the API. These are separate products — subscriptions provide access to claude.ai and desktop apps, not the API. All API usage is always billed per token.
| Plan | Price | For | API Access? |
|---|---|---|---|
| Free | $0 | Casual users, trials | No |
| Pro | $20/mo ($17/mo annual) | Individual productivity (claude.ai) | No |
| Max | From $100/mo | Power users — 5× or 20× Pro usage | No |
| Team (Standard seat) | $25/seat/mo ($20 annual) | Teams up to 150 people | No |
| Team (Premium seat) | $125/seat/mo ($100 annual) | Power users on team — 5× usage | No |
| Enterprise | $20/seat + API rates | Large orgs with compliance needs | Yes (billed per token) |
| API (direct) | Per token | Developers building on Claude | Yes |
Key distinction: If you are building a product or automation that calls Claude programmatically, you are using the API and paying per token — regardless of whether you also have a Pro or Max subscription. Subscription plans do not provide API credits or reduce API costs.
Anthropic enforces two complementary sets of limits:
Rate limits vary by model — Haiku has higher default limits than Opus due to lower cost per request. Teams operating at scale should request limit increases proactively, before reaching capacity during peak periods. Being throttled in production costs more in engineering time and SLA impact than the deposit required to raise a tier.
Additional hidden costs to plan for: Web search tool calls via the Anthropic API cost $10 per 1,000 searches, on top of token costs. US-only data residency via the inference_geo parameter adds a 10% premium on all Opus 4.7 and above token costs. These are not reflected in base model pricing.
~2,000 input tokens, ~500 output tokens per conversation. System prompt (5K tokens) cached with 1-hour TTL.
| Model | Daily Cost | Monthly Cost | Notes |
|---|---|---|---|
| Sonnet 4.6 (cache hits) | ~$18 | ~$540 | Cached system prompt, rest standard |
| Haiku 4.5 (no cache) | ~$13 | ~$390 | Cheapest model wins outright for volume |
| Opus 4.7 (no cache) | ~$115 | ~$3,450 | Unnecessary for support — 9× Sonnet cost |
Async batch, no caching. Documents processed overnight.
| Model + Mode | Per-doc Cost | Monthly Cost | Notes |
|---|---|---|---|
| Haiku 4.5 Batch | ~$0.0015 | ~$150 | Best value for bulk processing |
| Sonnet 4.6 Batch | ~$0.0090 | ~$900 | Use if quality requires it |
| Opus 4.7 Batch | ~$0.019 | ~$1,875 | Rarely justified for batch extraction |
Complex multi-step reasoning, code generation. Real-time. No meaningful caching (each session is unique).
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Opus 4.8 | ~$75 | ~$2,250 |
| Sonnet 4.6 | ~$45 | ~$1,350 |
| Haiku 4.5 | ~$15 | ~$450 |
For agentic coding, routing by task complexity delivers the best outcome: use Haiku for simple completions and quick lookups, Sonnet for most code tasks, and Opus only for the most complex architectural reasoning. A well-implemented router can bring blended costs close to Haiku rates while maintaining Opus quality where it counts.
Anthropic API pricing is per million tokens. Current rates: Claude Opus 4.8 at $5.00/$25.00 per MTok, Claude Sonnet 4.6 at $3.00/$15.00 per MTok, Claude Haiku 4.5 at $1.00/$5.00 per MTok. Batch processing halves all token costs. Prompt caching reduces cached input by 90%.
Claude Opus 4.8 costs $5.00 per million input tokens and $25.00 per million output tokens at standard rates — identical to Opus 4.7. Fast Mode is $10/$50 per MTok (down from $30/$150 on Opus 4.7). Batch: $2.50/$12.50. Cache hit: $0.50/MTok (90% off). Supports 1M context at flat rates with no surcharge. The 4.7 → 4.8 migration carries no new tokenizer penalty.
Claude Haiku 4.5 at $1.00/$5.00 per MTok is the cheapest current-generation model. With batch processing it drops to $0.50/$2.50.
You mark content blocks with cache_control: { type: "ephemeral" } in your API request. The first request writes the cache at 1.25× input cost (5-min TTL) or 2.0× input cost (1-hour TTL). Subsequent requests that hit the cache pay only 0.10× input — a 90% discount. Accessing a cached block resets its TTL.
Claude Opus 4.8, Opus 4.7, and Sonnet 4.6 all support 1M token contexts at completely flat rates — no surcharge. Claude Sonnet 4.5 on the 1M-token beta applies a 2× input / 1.5× output surcharge above 200K tokens. Claude Haiku 4.5 has a 200K context window with no surcharge.
No. Free access is only available via the claude.ai web and mobile interface. All API usage is billed per token regardless of whether you have a consumer subscription. There is no free tier for direct API access.
At standard rates, OpenAI is generally cheaper — GPT-5.4 at $2.50/$15.00 vs Claude Opus 4.8 at $5.00/$25.00. However, both providers now offer ~90% caching discounts, making effective costs competitive for cache-heavy workloads. Anthropic leads on long-context flat-rate pricing (Opus 4.8, Opus 4.7, and Sonnet 4.6 at 1M tokens, no surcharge) and on complex reasoning quality. See our full OpenAI vs Anthropic pricing comparison for a complete breakdown.
Use Anthropic's built-in usage API to pull token and cost data. For teams operating at scale across multiple providers, a FinOps platform like Finout provides unified visibility, cost allocation by team or feature, and real-time anomaly detection across your full AI and cloud spend — without requiring custom instrumentation.
Anthropic's API pricing in 2026 is meaningfully more nuanced than it was a year ago. The three-tier model lineup (Haiku → Sonnet → Opus) is stable, but the cost levers — prompt caching, batch processing, long-context surcharge avoidance, and model selection — create a wide range of effective costs for the same workload depending on how well you optimize.
The highest-impact changes for most teams, in order of ROI: