Claude Opus 4.8 Pricing 2026: Everything You Need to Know

Written by Finout Team | May 28, 2026 6:05:53 PM

Anthropic released Claude Opus 4.8 today, and the headline is familiar: pricing unchanged. $5 per million input tokens, $25 per million output tokens, the same rate card that has held since Opus 4.5. If you stop reading there, you will miss the part that matters most to anyone running Opus at scale.

If you are here, maybe you'd be interested in learning how teams manage AI costs at scale:

Two things shifted the real cost math with this release. First, Fast Mode is now three times cheaper than it was for Opus 4.7, making a 2.5x-speed version of the frontier model accessible at $10/$50 per million tokens. Second, the same tokenizer that quietly raised effective costs by up to 35% for teams migrating from Opus 4.6 to 4.7 is still in play for anyone who skipped that migration. This post breaks down every pricing layer, runs the numbers on four realistic workloads, covers Fast Mode and effort control, compares Opus 4.8 against GPT-5.5 and Gemini 3.1 Pro on cost-per-quality, and tells you exactly when to stay, upgrade, or route traffic elsewhere.

Claude Opus 4.8 Pricing at a Glance

Mode	Input (per 1M tokens)	Output (per 1M tokens)
Standard	$5	$25
Fast Mode	$10	$50
Batch (async)	$2.50	$12.50
Prompt cache read	~$0.50	—
Prompt cache write	Standard input rate	—

Fast Mode is new pricing territory. Previous Opus fast modes were priced at roughly $30/$150 per million tokens. Dropping it to $10/$50 while running at 2.5x the speed is a structural change in what agentic workloads can cost.

The Real Pricing Story: Three Layers

Layer 1 — Sticker price (unchanged for three generations)

$5 input, $25 output. This has not moved since Opus 4.5. Anthropic is signaling that frontier pricing at this tier is stable while the underlying capability climbs. For budget planning, that is a gift. For effective cost analysis, it is a starting point, not an ending point.

Layer 2 — The 4.7 tokenizer (still in play for teams on 4.6)

Opus 4.7 introduced a new tokenizer that produces up to 35% more tokens for the same input. That change carries forward into 4.8 — the tokenizer did not change between 4.7 and 4.8. If you migrated from 4.6 to 4.7, you already absorbed that impact. If you are migrating directly from 4.6 to 4.8, the same 0–35% effective cost increase applies on input, and potentially more on output if the model is more thorough by default. Measure before committing.

The 4.7-to-4.8 migration does not carry a new tokenizer penalty, which makes it one of the cleaner version upgrades in the Claude 4.x cycle.

Layer 3 — Fast Mode (the new lever)

Fast Mode runs Opus 4.8 at approximately 2.5x standard speed. The pricing is double the standard rate ($10/$50), but that math only matters if you compare it to standard speed. Compare it to what fast mode cost before — roughly $30/$150 per million tokens for prior Opus versions — and it is 3x cheaper for equivalent speed. For latency-sensitive agentic workloads that previously could not justify fast mode, this changes the calculus entirely.

Benchmark Performance vs. Cost: How Opus 4.8 Stacks Up

These are not just capability numbers. They directly determine whether paying Opus prices is justified for a given workload.

Benchmark	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
SWE-bench Pro (coding)	69.2%	64.3%	58.6%	54.2%
OSWorld-Verified (computer use)	83.4%	82.3%	78.7%	76.2%
Online-Mind2Web (browser agent)	84%	—	—	—
Humanity's Last Exam (no tools)	49.8%	—	—	—
Humanity's Last Exam (with tools)	57.9%	—	—	—
Legal Agent Benchmark (all-pass)	First to break 10%	—	—	—

The SWE-bench jump from 64.3% to 69.2% is not cosmetic. In production coding agents, a 5-point lift at the high end of the capability curve tends to show up as fewer failed runs, fewer human interventions, and lower end-to-end cost per completed task even if the per-token cost stays flat.

Cursor tested this directly: "Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through." Fewer tool calls for the same outcome means fewer tokens consumed. Better efficiency at constant pricing is a real price cut, just not the kind that shows up on a rate card.

The Full Claude Lineup, Priced for Comparison

Model	Input ($/1M)	Output ($/1M)	Context	Best for
Claude Opus 4.8	$5	$25	1M tokens	Frontier coding, agents, high-stakes reasoning, legal, finance
Claude Opus 4.7	$5	$25	1M tokens	Still capable; consider migrating
Claude Sonnet 4.6	$3	$15	1M tokens	Default for most production inference
Claude Haiku 4.5	$1	$5	200K tokens	High-volume, low-latency, simple tasks

Sonnet 4.6 is 40% cheaper per token than Opus on both input and output. For most production inference — classification, RAG responses, content generation, routine tool use — Sonnet remains the cost-effective default. Opus 4.8 is a premium SKU for workloads where quality differentiates revenue or where the agent task requires sustained multi-step reasoning that lighter models cannot complete reliably.

Haiku 4.5 is 5x cheaper than Opus on both dimensions. For extraction, routing, moderation, or annotation at volume, Haiku is often the right answer regardless of what Opus can do.

Cost Projections: Four Realistic Workloads

These numbers are illustrative, built from the pricing structure above. Use them to calibrate your own workload, not as quotes.

Workload 1 — Coding agent, standard pace, 1M input / 200K output per day

Opus 4.7 (current):

Input: 1M × $5 = $5.00
Output: 0.2M × $25 = $5.00
Daily: $10.00 / Monthly: ~$300

Opus 4.8 (upgrade from 4.7, same tokenizer):

Same calculation. No tokenizer delta between 4.7 and 4.8.
Daily: $10.00 / Monthly: ~$300

Opus 4.8 Fast Mode (latency-sensitive version of same workload):

Input: 1M × $10 = $10.00
Output: 0.2M × $50 = $10.00
Daily: $20.00 / Monthly: ~$600 — but you get 2.5x the throughput

Opus 4.8 Batch (async, overnight runs):

Input: 1M × $2.50 = $2.50
Output: 0.2M × $12.50 = $2.50
Daily: $5.00 / Monthly: ~$150 — half the standard cost

For this workload, the migration from 4.7 to 4.8 at standard pricing is cost-neutral. The decision is whether to redirect overnight runs to Batch (saves $150/month) or whether the new Fast Mode throughput justifies doubling spend.

Workload 2 — RAG assistant, 5M input / 500K output per day, 70% cache hit

Opus 4.8 input:

Cached 3.5M tokens × ~$0.50 = $1.75
Uncached 1.5M tokens × $5 = $7.50
Input subtotal: $9.25

Opus 4.8 output:

0.5M × $25 = $12.50

Daily: $21.75 / Monthly: ~$652

Same workload on Sonnet 4.6:

Cached 3.5M × ~$0.30 = $1.05
Uncached 1.5M × $3 = $4.50
Output: 0.5M × $15 = $7.50
Daily: ~$13.05 / Monthly: ~$392

Opus vs. Sonnet on this workload: Opus costs ~66% more per month. Unless your evaluation shows a clear quality lift on RAG response accuracy that translates to better user outcomes, stay on Sonnet 4.6. Most teams that default to Opus for RAG are paying a ~$260/month premium per workload unit for a quality difference that their users cannot perceive.

Workload 3 — Autonomous SWE agent, 10M input / 2M output per day, no caching

Opus 4.8 standard:

Input: 10M × $5 = $50.00
Output: 2M × $25 = $50.00
Daily: $100 / Monthly: ~$3,000

Opus 4.8 standard + 30% tokenizer inflation (if migrating from Opus 4.6):

Effective input: 13M × $5 = $65.00
Effective output: 2.6M × $25 = $65.00
Daily: $130 / Monthly: ~$3,900

Opus 4.8 Batch (async pipeline):

Daily: ~$50 / Monthly: ~$1,500

Opus 4.8 with prompt caching (system prompt + tool definitions, 50% cache hit):

Cached 5M × $0.50 = $2.50
Uncached 5M × $5 = $25.00
Output: 2M × $25 = $50.00
Daily: $77.50 / Monthly: ~$2,325

The spread between worst case (4.6 migration, no caching: ~$3,900/month) and best case (Batch: ~$1,500/month) on the same workload is $2,400/month. That is the range caching and batch architecture controls.

Workload 4 — Legal / financial document analysis, Fast Mode, 2M input / 500K output per day

This workload is time-sensitive (partners waiting on analysis) and previously could not justify old fast mode pricing. With Fast Mode at $10/$50:

Opus 4.8 Fast Mode:

Input: 2M × $10 = $20.00
Output: 0.5M × $50 = $25.00
Daily: $45.00 / Monthly: ~$1,350

Equivalent workload at old fast mode pricing (~$30/$150):

Input: 2M × $30 = $60.00
Output: 0.5M × $150 = $75.00
Daily: $135.00 / Monthly: ~$4,050

Savings from new Fast Mode pricing: ~$2,700/month on this workload. For latency-critical professional services workloads, the new Fast Mode is the biggest cost story in this release.

Effort Control: The New Cost Dial

Opus 4.8 introduces user-level effort control (low, high, extra, max), and the API now surfaces xhigh in Claude Code. This is not just a user experience feature — it is a cost lever.

The default is high, which Anthropic says uses a similar number of tokens to Opus 4.7 default but with better performance. Moving to extra or max will increase token consumption in exchange for higher quality on difficult tasks. Moving to low reduces both tokens and latency, useful when an Opus-class model is needed for capability but not necessarily for depth on a specific call.

Practical cost implications:

For long-running async agents, extra effort on difficult reasoning steps is worth the token cost because errors are expensive.
For document routing or light triage inside an Opus-powered pipeline, low effort cuts consumption without dropping to a cheaper model (which might require prompt reengineering).
For evaluation sweeps or nightly batch runs, effort control gives you a finer-grained cost knob than model selection alone.

There is no separate pricing tier for effort levels — they consume more or fewer tokens at the same per-token rate.

Dynamic Workflows: What They Cost

Dynamic workflows (research preview, available on Enterprise, Team, and Max plans) let Claude Code plan a task and spin up hundreds of parallel subagents in a single session. A codebase migration across hundreds of thousands of lines of code — from kickoff to merge, verified against the existing test suite — is now a single Claude Code invocation.

The cost math: each subagent consumes tokens at the same Opus 4.8 rate. If a migration involves 200 parallel subagents each processing 50K input tokens and producing 10K output tokens, that is:

Input: 200 × 50K = 10M tokens × $5 = $50
Output: 200 × 10K = 2M tokens × $25 = $50
Total: $100 for a codebase-scale migration run

Comparable work billed at developer hourly rates would typically cost far more. The economics of dynamic workflows depend on the task, but for deterministic, high-volume operations (migrations, test generation, documentation, refactoring) the per-run cost is often a fraction of equivalent engineering time.

The risk: runaway subagent trees. A misconfigured plan that spins 500 subagents instead of 50 is a 10x cost spike. Set budget limits in your harness before enabling dynamic workflows in production.

Prompt Caching and Batch: The Two Levers That Matter Most

These two discounts interact with every workload above and are worth understanding precisely.

Prompt caching is priced at approximately $0.50 per million tokens on cache reads (10% of the standard input rate). Cache writes happen at the standard input rate. To break even on a cache write, you need the cached content to be read at least 10 times — which is almost always true for system prompts, tool definitions, and shared document context in multi-turn agents.

Cache hit ratio is the metric to watch after any migration. The 4.7 tokenizer changes can shift token boundaries and invalidate old cache entries on the first run. Opus 4.8 keeps the same tokenizer as 4.7, so 4.7 cache entries should survive the upgrade — validate this on your specific prompts before assuming continuity.

Batch processing gives a flat 50% discount on all tokens in exchange for async execution (minutes to hours). It stacks on top of, not instead of, caching. A nightly batch job with 70% cache hit rates is running at roughly 5% of list price on the cached portion and 50% on the uncached portion. For pipelines that can tolerate latency, Batch is the single highest-leverage cost control available.

What goes in Batch: nightly summarization, document analysis, evaluation sweeps, training data generation, red-team runs, backfill operations, and any agent task where the user is not waiting synchronously.

Migrating From Opus 4.7 to 4.8

This is the cleanest upgrade in the 4.x cycle from a cost perspective. No new tokenizer, no new discounting structure. The checklist:

Before you migrate:

Replay a representative traffic sample through Opus 4.8 and confirm that token counts are comparable to 4.7. They should be — same tokenizer — but verify with your specific content mix.
Check that effort level defaults meet your expectations. Opus 4.8 defaults to high, which Anthropic calibrates at similar token volume to Opus 4.7 default, but with better performance. Monitor output length — a more capable model that produces longer, more accurate outputs will still increase your bill per call.
Validate that prompt cache entries survive the migration. Tokenizer is unchanged, so they should. Confirm on your actual prompts before cutting production traffic over.

After migration:

Monitor for quality improvements that change your routing logic. If Opus 4.8 completes tasks Opus 4.7 failed, you may be able to reduce human intervention steps or retry loops that were previously baked in — those are real cost savings that do not appear on the per-token line.
Evaluate whether new Fast Mode is worth enabling for any latency-sensitive workflows. At $10/$50, it is 3x cheaper than what fast mode used to cost, so re-run the math even if you dismissed it before.
Consider dynamic workflows for any large-scale agentic task that currently requires manual orchestration.

Migrating From Opus 4.6 to 4.8 (Skipping 4.7)

If you are still on Opus 4.6, the 4.7 tokenizer inflation is still the primary cost risk to model before migrating. The same 0–35% effective cost increase on input — with the upper end concentrated on code, JSON, and non-English text — applies when jumping to 4.8 directly.

Replay 100–1,000 production requests through both models, compare token counts and costs, then decide whether the benchmark gains (SWE-bench Pro: 53.4% on Opus 4.6 → 69.2% on Opus 4.8) justify the effective cost delta. For autonomous coding workloads, that 15.8-point SWE-bench gap typically translates to fewer failed runs and less human intervention, so the per-task cost can be lower even if per-token cost is higher.

Claude Opus 4.8 vs. GPT-5.5 and Gemini 3.1 Pro: Cost-Adjusted

	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
Input pricing	$5/1M	~$10/1M	~$7/1M
Output pricing	$25/1M	~$40/1M	~$30/1M
SWE-bench Pro	69.2%	58.6%	54.2%
OSWorld-Verified	83.4%	78.7%	76.2%
Context window	1M tokens	128K	2M tokens
Fast/speed tier	Yes ($10/$50)	Yes	Yes

GPT-5.5 and Gemini 3.1 Pro pricing based on publicly available rates at time of writing; verify current rates directly.

Opus 4.8 is priced meaningfully below GPT-5.5 at the input level and competitive with Gemini 3.1 Pro, while leading on the two benchmarks that matter most for agentic and coding workloads. For teams evaluating across providers: Opus 4.8 offers the best SWE-bench Pro and OSWorld scores at lower list price than GPT-5.5. Gemini 3.1 Pro's 2M context window is relevant for ultra-long document workloads, but Opus 4.8's 1M context covers the vast majority of production use cases.

What the Honesty Improvements Mean for Cost

Opus 4.8 is approximately four times less likely than Opus 4.7 to allow flaws in written code to pass unremarked. This is an alignment improvement that has direct cost implications.

In a production coding agent, a missed bug that passes review costs more than the token spend of the original run — it costs a human engineer's time to debug, a re-run of tests, a context reload, and potentially a customer-visible incident. An agent that proactively flags uncertainty costs more tokens per run (it generates more cautionary output) but typically costs less per successfully completed task.

The Databricks team put it well: Opus 4.8 "proactively flags issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch." If your current agent architecture has human review baked in specifically because the model is unreliable, the honesty improvements may let you reduce that step — which is a real cost reduction that does not show up on the Anthropic invoice.

The Mythos Context: Why This Release Matters Beyond 4.8

Anthropic confirmed today that Mythos-class models — currently available in limited preview to a small number of cybersecurity organizations via Project Glasswing — are expected to reach general availability "in the coming weeks." Opus 4.8 already achieves Mythos-level alignment scores on misaligned behavior metrics.

What does this mean for pricing? Mythos-class models will almost certainly carry a higher price point than the $5/$25 Opus structure. Teams that can solve their workloads within Opus 4.8 capability today should lock in those architectures now, rather than waiting and incurring the migration cost later when a new, more expensive tier arrives. Conversely, if your use case genuinely needs more intelligence than Opus 4.8 provides, Mythos access is worth watching.

Should You Upgrade From Opus 4.7 to 4.8?

Upgrade if:

You run autonomous coding agents. The SWE-bench jump from 64.3% to 69.2% is material for production agent reliability.
You have latency-sensitive workflows that previously could not justify fast mode. At $10/$50, re-run the math.
You want alignment improvements — specifically, if you are relying on human review to catch model errors, Opus 4.8's honesty improvements may let you reduce that step.
You use Claude Code and want dynamic workflows for large-scale agentic tasks.
You care about computer use or browser automation. OSWorld-Verified 83.4% and Online-Mind2Web 84% are the strongest scores currently available.

Stay on 4.7, or consider Sonnet 4.6, if:

Your workload is cost-sensitive and evals do not show a meaningful quality delta between 4.7 and 4.8 for your specific task.
You are running RAG or simple content generation, where Sonnet 4.6 delivers comparable results at 40% lower per-token cost.
You are processing high-volume, low-complexity tasks where Haiku 4.5 is sufficient.

Frequently Asked Questions

Is Claude Opus 4.8 more expensive than Opus 4.7? Per-token sticker prices are identical: $5 input, $25 output. If you are migrating from 4.7 to 4.8, effective cost should be nearly flat — the tokenizer is unchanged between these versions. If you are migrating from 4.6 to 4.8, the 4.7-era tokenizer change can add 0–35% to effective per-request costs depending on your content.

What is Fast Mode and how much does it cost? Fast Mode runs Opus 4.8 at approximately 2.5x the standard speed, priced at $10 per million input tokens and $50 per million output tokens. That is double the standard rate per token, but three times cheaper than fast mode pricing on previous Opus models. If your workload previously used standard mode because fast mode was too expensive, re-evaluate — the economics changed materially.

How does Opus 4.8 pricing compare to GPT-5.5? Opus 4.8 is priced at $5/$25 per million tokens. GPT-5.5 is priced at approximately $10/$40. For comparable or better performance on coding and agentic benchmarks, Opus 4.8 is meaningfully cheaper.

Does prompt caching still work with Opus 4.8? Yes. Cache reads are discounted by approximately 90% (to roughly $0.50/M on input). Because the tokenizer did not change between 4.7 and 4.8, existing cache entries from Opus 4.7 workloads should survive the migration. Validate on your specific prompts before assuming continuity.

What are dynamic workflows and do they cost extra? Dynamic workflows are a Claude Code feature (research preview, Enterprise/Team/Max plans) that lets Claude plan a task and run hundreds of parallel subagents. There is no premium pricing — subagents consume tokens at standard Opus 4.8 rates. The cost scales with the number of subagents and their token consumption, which is why setting budget guardrails before enabling the feature in production is important.

What is effort control and does it affect pricing? Effort control (low, high, extra, max) lets users and developers tune how deeply Opus 4.8 reasons through a task. Higher effort consumes more tokens at the same per-token rate. There is no separate price per effort tier.

When will Opus 4.7 be deprecated? Anthropic has not published a deprecation date for Opus 4.7. Opus rate limits are pooled across versions, so you can mix 4.7 and 4.8 traffic during a gradual migration.

What is Claude Mythos and should I wait for it? Mythos Preview is currently available only to a small number of cybersecurity organizations via Project Glasswing. Anthropic expects to bring Mythos-class models to general availability "in the coming weeks." Pricing has not been announced, but a model class above Opus will likely carry a higher price point. If Opus 4.8 solves your workload, build on it now.

How do I track actual Opus 4.8 costs across teams and workloads? Per-token rate cards do not tell you which team, feature, or customer is driving spend. You need cost allocation at the call level — which model, which workflow, which product, with unit economics per resolved task or generated output. Finout's Anthropic integration pulls this into MegaBill alongside your cloud infrastructure spend

How to Track Claude Opus 4.8 Costs with Finout

The Opus 4.8 pricing story has more variables than the rate card shows: standard vs. Fast Mode vs. Batch, tokenizer density, effort level, cache hit rate, and subagent count in dynamic workflows. Each variable can move your effective cost per request significantly. A fixed rate card and a monthly invoice tell you what you spent. They do not tell you why, which team drove it, or whether you are getting the unit economics you planned for.

Finout's Anthropic integration pulls Claude API spend into MegaBill alongside AWS, GCP, Azure, Kubernetes, Snowflake, and the rest of the stack. One view, one source of truth, call-level granularity.

What that unlocks for an Opus 4.8 rollout:

Effective cost per call, not headline cost per token. When Fast Mode is enabled for some workflows and standard mode for others, you need to see both — and the effective cost per completed task, not just tokens consumed.

Allocation by team, feature, or product line. Virtual Tags map every Claude call to the business object that drove it, without waiting on engineering to re-tag resources. The coding agent team owns Opus 4.8 standard. The legal analysis team owns Opus 4.8 Fast Mode. The RAG pipeline owns Sonnet 4.6. Accountability lands where decisions are made.

Cache hit rate visibility, per workload. If a migration silently drops cache hit rates and costs climb 30%, anomaly detection flags it the same day — not at close of month.

Unit economics for every AI workload. Cost per resolved ticket, per completed migration, per generated report. This is how you determine whether Opus 4.8's quality improvements justify the rate compared to Sonnet 4.6 — with measurement, not intuition.

Subagent cost tracking in dynamic workflows. A dynamic workflow that spawns 500 subagents instead of 50 is a 10x cost event. Real-time visibility into agent token consumption is the control layer that prevents those surprises from becoming month-end shocks.

The Bottom Line

Claude Opus 4.8 launched today at the same base price as Opus 4.7, with three meaningful changes to the real cost equation. Fast Mode is now 3x cheaper than it was, making 2.5x-speed inference accessible at $10/$50 for workloads where it previously made no financial sense. Dynamic workflows introduce a parallel subagent architecture with no premium pricing — just token consumption at scale, which needs budget guardrails. And the honesty improvements, while not a line on the rate card, reduce the real cost of agent errors in production.

For teams migrating from 4.7, the upgrade is cost-neutral at the token level and the benchmark gains on coding and agentic tasks are real. For teams still on 4.6, the 4.7-era tokenizer change is still the first thing to measure before committing. For teams defaulting to Opus for everything, the honest answer remains: most workloads belong on Sonnet 4.6 or Haiku 4.5, and Opus 4.8 is the right choice specifically when autonomous reasoning, code quality, or legal/financial accuracy is what you are paying for.

Sources

View full post