Anthropic released Claude Opus 4.8 today, and the headline is familiar: pricing unchanged. $5 per million input tokens, $25 per million output tokens, the same rate card that has held since Opus 4.5. If you stop reading there, you will miss the part that matters most to anyone running Opus at scale.
Two things shifted the real cost math with this release. First, Fast Mode is now three times cheaper than it was for Opus 4.7, making a 2.5x-speed version of the frontier model accessible at $10/$50 per million tokens. Second, the same tokenizer that quietly raised effective costs by up to 35% for teams migrating from Opus 4.6 to 4.7 is still in play for anyone who skipped that migration. This post breaks down every pricing layer, runs the numbers on four realistic workloads, covers Fast Mode and effort control, compares Opus 4.8 against GPT-5.5 and Gemini 3.1 Pro on cost-per-quality, and tells you exactly when to stay, upgrade, or route traffic elsewhere.
|
Mode |
Input (per 1M tokens) |
Output (per 1M tokens) |
|---|---|---|
|
Standard |
$5 |
$25 |
|
Fast Mode |
$10 |
$50 |
|
Batch (async) |
$2.50 |
$12.50 |
|
Prompt cache read |
~$0.50 |
— |
|
Prompt cache write |
Standard input rate |
— |
Fast Mode is new pricing territory. Previous Opus fast modes were priced at roughly $30/$150 per million tokens. Dropping it to $10/$50 while running at 2.5x the speed is a structural change in what agentic workloads can cost.
$5 input, $25 output. This has not moved since Opus 4.5. Anthropic is signaling that frontier pricing at this tier is stable while the underlying capability climbs. For budget planning, that is a gift. For effective cost analysis, it is a starting point, not an ending point.
Opus 4.7 introduced a new tokenizer that produces up to 35% more tokens for the same input. That change carries forward into 4.8 — the tokenizer did not change between 4.7 and 4.8. If you migrated from 4.6 to 4.7, you already absorbed that impact. If you are migrating directly from 4.6 to 4.8, the same 0–35% effective cost increase applies on input, and potentially more on output if the model is more thorough by default. Measure before committing.
The 4.7-to-4.8 migration does not carry a new tokenizer penalty, which makes it one of the cleaner version upgrades in the Claude 4.x cycle.
Fast Mode runs Opus 4.8 at approximately 2.5x standard speed. The pricing is double the standard rate ($10/$50), but that math only matters if you compare it to standard speed. Compare it to what fast mode cost before — roughly $30/$150 per million tokens for prior Opus versions — and it is 3x cheaper for equivalent speed. For latency-sensitive agentic workloads that previously could not justify fast mode, this changes the calculus entirely.
These are not just capability numbers. They directly determine whether paying Opus prices is justified for a given workload.
|
Benchmark |
Opus 4.8 |
Opus 4.7 |
GPT-5.5 |
Gemini 3.1 Pro |
|---|---|---|---|---|
|
SWE-bench Pro (coding) |
69.2% |
64.3% |
58.6% |
54.2% |
|
OSWorld-Verified (computer use) |
83.4% |
82.3% |
78.7% |
76.2% |
|
Online-Mind2Web (browser agent) |
84% |
— |
— |
— |
|
Humanity's Last Exam (no tools) |
49.8% |
— |
— |
— |
|
Humanity's Last Exam (with tools) |
57.9% |
— |
— |
— |
|
Legal Agent Benchmark (all-pass) |
First to break 10% |
— |
— |
— |
The SWE-bench jump from 64.3% to 69.2% is not cosmetic. In production coding agents, a 5-point lift at the high end of the capability curve tends to show up as fewer failed runs, fewer human interventions, and lower end-to-end cost per completed task even if the per-token cost stays flat.
Cursor tested this directly: "Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through." Fewer tool calls for the same outcome means fewer tokens consumed. Better efficiency at constant pricing is a real price cut, just not the kind that shows up on a rate card.
|
Model |
Input ($/1M) |
Output ($/1M) |
Context |
Best for |
|---|---|---|---|---|
|
Claude Opus 4.8 |
$5 |
$25 |
1M tokens |
Frontier coding, agents, high-stakes reasoning, legal, finance |
|
Claude Opus 4.7 |
$5 |
$25 |
1M tokens |
Still capable; consider migrating |
|
Claude Sonnet 4.6 |
$3 |
$15 |
1M tokens |
Default for most production inference |
|
Claude Haiku 4.5 |
$1 |
$5 |
200K tokens |
High-volume, low-latency, simple tasks |
Sonnet 4.6 is 40% cheaper per token than Opus on both input and output. For most production inference — classification, RAG responses, content generation, routine tool use — Sonnet remains the cost-effective default. Opus 4.8 is a premium SKU for workloads where quality differentiates revenue or where the agent task requires sustained multi-step reasoning that lighter models cannot complete reliably.
Haiku 4.5 is 5x cheaper than Opus on both dimensions. For extraction, routing, moderation, or annotation at volume, Haiku is often the right answer regardless of what Opus can do.
These numbers are illustrative, built from the pricing structure above. Use them to calibrate your own workload, not as quotes.
Opus 4.7 (current):
Opus 4.8 (upgrade from 4.7, same tokenizer):
Opus 4.8 Fast Mode (latency-sensitive version of same workload):
Opus 4.8 Batch (async, overnight runs):
For this workload, the migration from 4.7 to 4.8 at standard pricing is cost-neutral. The decision is whether to redirect overnight runs to Batch (saves $150/month) or whether the new Fast Mode throughput justifies doubling spend.
Opus 4.8 input:
Opus 4.8 output:
Daily: $21.75 / Monthly: ~$652
Same workload on Sonnet 4.6:
Opus vs. Sonnet on this workload: Opus costs ~66% more per month. Unless your evaluation shows a clear quality lift on RAG response accuracy that translates to better user outcomes, stay on Sonnet 4.6. Most teams that default to Opus for RAG are paying a ~$260/month premium per workload unit for a quality difference that their users cannot perceive.
Opus 4.8 standard:
Opus 4.8 standard + 30% tokenizer inflation (if migrating from Opus 4.6):
Opus 4.8 Batch (async pipeline):
Opus 4.8 with prompt caching (system prompt + tool definitions, 50% cache hit):
The spread between worst case (4.6 migration, no caching: ~$3,900/month) and best case (Batch: ~$1,500/month) on the same workload is $2,400/month. That is the range caching and batch architecture controls.
This workload is time-sensitive (partners waiting on analysis) and previously could not justify old fast mode pricing. With Fast Mode at $10/$50:
Opus 4.8 Fast Mode:
Equivalent workload at old fast mode pricing (~$30/$150):
Savings from new Fast Mode pricing: ~$2,700/month on this workload. For latency-critical professional services workloads, the new Fast Mode is the biggest cost story in this release.
Opus 4.8 introduces user-level effort control (low, high, extra, max), and the API now surfaces xhigh in Claude Code. This is not just a user experience feature — it is a cost lever.
The default is high, which Anthropic says uses a similar number of tokens to Opus 4.7 default but with better performance. Moving to extra or max will increase token consumption in exchange for higher quality on difficult tasks. Moving to low reduces both tokens and latency, useful when an Opus-class model is needed for capability but not necessarily for depth on a specific call.
Practical cost implications:
There is no separate pricing tier for effort levels — they consume more or fewer tokens at the same per-token rate.
Dynamic workflows (research preview, available on Enterprise, Team, and Max plans) let Claude Code plan a task and spin up hundreds of parallel subagents in a single session. A codebase migration across hundreds of thousands of lines of code — from kickoff to merge, verified against the existing test suite — is now a single Claude Code invocation.
The cost math: each subagent consumes tokens at the same Opus 4.8 rate. If a migration involves 200 parallel subagents each processing 50K input tokens and producing 10K output tokens, that is:
Comparable work billed at developer hourly rates would typically cost far more. The economics of dynamic workflows depend on the task, but for deterministic, high-volume operations (migrations, test generation, documentation, refactoring) the per-run cost is often a fraction of equivalent engineering time.
The risk: runaway subagent trees. A misconfigured plan that spins 500 subagents instead of 50 is a 10x cost spike. Set budget limits in your harness before enabling dynamic workflows in production.
These two discounts interact with every workload above and are worth understanding precisely.
Prompt caching is priced at approximately $0.50 per million tokens on cache reads (10% of the standard input rate). Cache writes happen at the standard input rate. To break even on a cache write, you need the cached content to be read at least 10 times — which is almost always true for system prompts, tool definitions, and shared document context in multi-turn agents.
Cache hit ratio is the metric to watch after any migration. The 4.7 tokenizer changes can shift token boundaries and invalidate old cache entries on the first run. Opus 4.8 keeps the same tokenizer as 4.7, so 4.7 cache entries should survive the upgrade — validate this on your specific prompts before assuming continuity.
Batch processing gives a flat 50% discount on all tokens in exchange for async execution (minutes to hours). It stacks on top of, not instead of, caching. A nightly batch job with 70% cache hit rates is running at roughly 5% of list price on the cached portion and 50% on the uncached portion. For pipelines that can tolerate latency, Batch is the single highest-leverage cost control available.
What goes in Batch: nightly summarization, document analysis, evaluation sweeps, training data generation, red-team runs, backfill operations, and any agent task where the user is not waiting synchronously.
This is the cleanest upgrade in the 4.x cycle from a cost perspective. No new tokenizer, no new discounting structure. The checklist:
Before you migrate:
After migration:
If you are still on Opus 4.6, the 4.7 tokenizer inflation is still the primary cost risk to model before migrating. The same 0–35% effective cost increase on input — with the upper end concentrated on code, JSON, and non-English text — applies when jumping to 4.8 directly.
Replay 100–1,000 production requests through both models, compare token counts and costs, then decide whether the benchmark gains (SWE-bench Pro: 53.4% on Opus 4.6 → 69.2% on Opus 4.8) justify the effective cost delta. For autonomous coding workloads, that 15.8-point SWE-bench gap typically translates to fewer failed runs and less human intervention, so the per-task cost can be lower even if per-token cost is higher.
|
Opus 4.8 |
GPT-5.5 |
Gemini 3.1 Pro |
|
|---|---|---|---|
|
Input pricing |
$5/1M |
~$10/1M |
~$7/1M |
|
Output pricing |
$25/1M |
~$40/1M |
~$30/1M |
|
SWE-bench Pro |
69.2% |
58.6% |
54.2% |
|
OSWorld-Verified |
83.4% |
78.7% |
76.2% |
|
Context window |
1M tokens |
128K |
2M tokens |
|
Fast/speed tier |
Yes ($10/$50) |
Yes |
Yes |
GPT-5.5 and Gemini 3.1 Pro pricing based on publicly available rates at time of writing; verify current rates directly.
Opus 4.8 is priced meaningfully below GPT-5.5 at the input level and competitive with Gemini 3.1 Pro, while leading on the two benchmarks that matter most for agentic and coding workloads. For teams evaluating across providers: Opus 4.8 offers the best SWE-bench Pro and OSWorld scores at lower list price than GPT-5.5. Gemini 3.1 Pro's 2M context window is relevant for ultra-long document workloads, but Opus 4.8's 1M context covers the vast majority of production use cases.
Opus 4.8 is approximately four times less likely than Opus 4.7 to allow flaws in written code to pass unremarked. This is an alignment improvement that has direct cost implications.
In a production coding agent, a missed bug that passes review costs more than the token spend of the original run — it costs a human engineer's time to debug, a re-run of tests, a context reload, and potentially a customer-visible incident. An agent that proactively flags uncertainty costs more tokens per run (it generates more cautionary output) but typically costs less per successfully completed task.
The Databricks team put it well: Opus 4.8 "proactively flags issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch." If your current agent architecture has human review baked in specifically because the model is unreliable, the honesty improvements may let you reduce that step — which is a real cost reduction that does not show up on the Anthropic invoice.
Anthropic confirmed today that Mythos-class models — currently available in limited preview to a small number of cybersecurity organizations via Project Glasswing — are expected to reach general availability "in the coming weeks." Opus 4.8 already achieves Mythos-level alignment scores on misaligned behavior metrics.
What does this mean for pricing? Mythos-class models will almost certainly carry a higher price point than the $5/$25 Opus structure. Teams that can solve their workloads within Opus 4.8 capability today should lock in those architectures now, rather than waiting and incurring the migration cost later when a new, more expensive tier arrives. Conversely, if your use case genuinely needs more intelligence than Opus 4.8 provides, Mythos access is worth watching.
Upgrade if:
Stay on 4.7, or consider Sonnet 4.6, if:
Is Claude Opus 4.8 more expensive than Opus 4.7? Per-token sticker prices are identical: $5 input, $25 output. If you are migrating from 4.7 to 4.8, effective cost should be nearly flat — the tokenizer is unchanged between these versions. If you are migrating from 4.6 to 4.8, the 4.7-era tokenizer change can add 0–35% to effective per-request costs depending on your content.
What is Fast Mode and how much does it cost? Fast Mode runs Opus 4.8 at approximately 2.5x the standard speed, priced at $10 per million input tokens and $50 per million output tokens. That is double the standard rate per token, but three times cheaper than fast mode pricing on previous Opus models. If your workload previously used standard mode because fast mode was too expensive, re-evaluate — the economics changed materially.
How does Opus 4.8 pricing compare to GPT-5.5? Opus 4.8 is priced at $5/$25 per million tokens. GPT-5.5 is priced at approximately $10/$40. For comparable or better performance on coding and agentic benchmarks, Opus 4.8 is meaningfully cheaper.
Does prompt caching still work with Opus 4.8? Yes. Cache reads are discounted by approximately 90% (to roughly $0.50/M on input). Because the tokenizer did not change between 4.7 and 4.8, existing cache entries from Opus 4.7 workloads should survive the migration. Validate on your specific prompts before assuming continuity.
What are dynamic workflows and do they cost extra? Dynamic workflows are a Claude Code feature (research preview, Enterprise/Team/Max plans) that lets Claude plan a task and run hundreds of parallel subagents. There is no premium pricing — subagents consume tokens at standard Opus 4.8 rates. The cost scales with the number of subagents and their token consumption, which is why setting budget guardrails before enabling the feature in production is important.
What is effort control and does it affect pricing? Effort control (low, high, extra, max) lets users and developers tune how deeply Opus 4.8 reasons through a task. Higher effort consumes more tokens at the same per-token rate. There is no separate price per effort tier.
When will Opus 4.7 be deprecated? Anthropic has not published a deprecation date for Opus 4.7. Opus rate limits are pooled across versions, so you can mix 4.7 and 4.8 traffic during a gradual migration.
What is Claude Mythos and should I wait for it? Mythos Preview is currently available only to a small number of cybersecurity organizations via Project Glasswing. Anthropic expects to bring Mythos-class models to general availability "in the coming weeks." Pricing has not been announced, but a model class above Opus will likely carry a higher price point. If Opus 4.8 solves your workload, build on it now.
How do I track actual Opus 4.8 costs across teams and workloads? Per-token rate cards do not tell you which team, feature, or customer is driving spend. You need cost allocation at the call level — which model, which workflow, which product, with unit economics per resolved task or generated output. Finout's Anthropic integration pulls this into MegaBill alongside your cloud infrastructure spend
The Opus 4.8 pricing story has more variables than the rate card shows: standard vs. Fast Mode vs. Batch, tokenizer density, effort level, cache hit rate, and subagent count in dynamic workflows. Each variable can move your effective cost per request significantly. A fixed rate card and a monthly invoice tell you what you spent. They do not tell you why, which team drove it, or whether you are getting the unit economics you planned for.
Finout's Anthropic integration pulls Claude API spend into MegaBill alongside AWS, GCP, Azure, Kubernetes, Snowflake, and the rest of the stack. One view, one source of truth, call-level granularity.
What that unlocks for an Opus 4.8 rollout:
Effective cost per call, not headline cost per token. When Fast Mode is enabled for some workflows and standard mode for others, you need to see both — and the effective cost per completed task, not just tokens consumed.
Allocation by team, feature, or product line. Virtual Tags map every Claude call to the business object that drove it, without waiting on engineering to re-tag resources. The coding agent team owns Opus 4.8 standard. The legal analysis team owns Opus 4.8 Fast Mode. The RAG pipeline owns Sonnet 4.6. Accountability lands where decisions are made.
Cache hit rate visibility, per workload. If a migration silently drops cache hit rates and costs climb 30%, anomaly detection flags it the same day — not at close of month.
Unit economics for every AI workload. Cost per resolved ticket, per completed migration, per generated report. This is how you determine whether Opus 4.8's quality improvements justify the rate compared to Sonnet 4.6 — with measurement, not intuition.
Subagent cost tracking in dynamic workflows. A dynamic workflow that spawns 500 subagents instead of 50 is a 10x cost event. Real-time visibility into agent token consumption is the control layer that prevents those surprises from becoming month-end shocks.
Claude Opus 4.8 launched today at the same base price as Opus 4.7, with three meaningful changes to the real cost equation. Fast Mode is now 3x cheaper than it was, making 2.5x-speed inference accessible at $10/$50 for workloads where it previously made no financial sense. Dynamic workflows introduce a parallel subagent architecture with no premium pricing — just token consumption at scale, which needs budget guardrails. And the honesty improvements, while not a line on the rate card, reduce the real cost of agent errors in production.
For teams migrating from 4.7, the upgrade is cost-neutral at the token level and the benchmark gains on coding and agentic tasks are real. For teams still on 4.6, the 4.7-era tokenizer change is still the first thing to measure before committing. For teams defaulting to Opus for everything, the honest answer remains: most workloads belong on Sonnet 4.6 or Haiku 4.5, and Opus 4.8 is the right choice specifically when autonomous reasoning, code quality, or legal/financial accuracy is what you are paying for.