Claude Opus 4.7 Pricing: The Real Cost Story Behind the “Unchanged” Price Tag

Apr 16th, 2026
Claude Opus 4.7 Pricing: The Real Cost Story Behind the “Unchanged” Price Tag
URL Copied

Anthropic released Claude Opus 4.7 on April 16, 2026, and the official headline is simple. Prices are unchanged from Opus 4.6. $5 per million input tokens, $25 per million output tokens, up to 90% savings with prompt caching, 50% with batch processing. If you stop reading the release notes there, you will miss the part that matters most to anyone running Opus at scale.

Opus 4.7 ships with a new tokenizer that can produce up to 35% more tokens for the same input text. Your real bill per request can go up even though the rate card did not. This post breaks down the pricing mechanics, runs the math on three realistic workloads, and compares Opus 4.7 to Sonnet 4.6, Haiku 4.5, and the older Opus models, so you can decide whether to migrate, stay, or split traffic across models.

Claude Opus 4.7 pricing at a glance

Opus 4.7 keeps the same sticker price as Opus 4.6, 4.5, and 4.1. What changed is how text turns into tokens.

  • Input: $5 per 1M tokens
  • Output: $25 per 1M tokens
  • Prompt caching: up to 90% discount on cache reads
  • Batch processing: 50% discount on async workloads
  • Context window: 1M tokens, same as Opus 4.6, with improved long-context retrieval
  • New tokenizer: up to 35% more tokens for the same input

If you already run Opus 4.6 workloads, your most likely outcome is a cost increase between 0% and 35% per request on the same prompts, driven entirely by the tokenizer change. Anthropic did not raise prices. Your bill may still grow.

The tokenizer change, and what it actually means

Tokenization converts raw text and images into the numerical units the model charges for. Opus 4.7 uses a new tokenizer that Anthropic says contributes to the model’s accuracy and instruction-following gains. The tradeoff is density. The same paragraph of English prose, the same Python function, the same JSON payload, can break into more tokens in 4.7 than it did in 4.6. The public estimate is a 1.0x to 1.35x multiplier, with the upper end showing up most often on code, structured data, and non-English text.

Three practical implications:

  • A 4.6 request that cost $0.10 could cost anywhere from $0.10 to $0.135 in 4.7, depending on content mix.
  • Cache-hit ratios carry over, so the 90% caching discount still applies. Caching remains the single biggest lever for controlling Opus cost.
  • Output token growth matters more than input growth, because output is priced 5x higher. If 4.7 is more thorough by default, your output tokens per response can climb on two axes at once: density plus verbosity.

Before migrating a production workload, replay real traffic side by side and measure the effective cost delta. Do not trust the 35% ceiling as a flat estimate, and do not trust 0% either.

Opus 4.7 vs the rest of the Claude lineup

The full Claude model family, priced for comparison:

Model

Input ($/1M)

Output ($/1M)

Context

Best for

Claude Opus 4.7

$5

$25

1M tokens

Frontier coding, agents, high-res vision

Claude Opus 4.6

$5

$25

1M tokens

Still-capable coding, lower effective cost/request

Claude Sonnet 4.6

$3

$15

1M tokens

Default for most production inference

Claude Haiku 4.5

$1

$5

200K tokens

High-volume, low-latency, simpler tasks

 

Two things jump out. Sonnet 4.6 is 40% cheaper per input token and 40% cheaper per output token than Opus, and for most production inference (classification, RAG responses, content generation, basic tool use) it remains the right default. Opus 4.7 is a premium model priced for a specific use case: autonomous coding agents, long-horizon tasks, and work where quality differentiates revenue. Haiku 4.5 is 5x cheaper than Opus on both input and output, and remains the right call for extraction, routing, or moderation at volume.

Cost projections: three realistic workloads

These numbers are illustrative, not quotes. They exist to show the shape of the cost surface, which now has three axes: sticker price (flat), tokenizer density (up to +35%), and discount tier (caching or batch).

Workload 1: Coding agent, 1M input / 200K output per day

  • Opus 4.6: (1M × $5) + (0.2M × $25) = $10/day, ~$300/month
  • Opus 4.7 at 35% token inflation: ~$13.50/day, ~$405/month
  • Delta: +$105/month, +35% on the same underlying work

Workload 2: RAG assistant, 5M input / 500K output per day, 70% cache hit ratio

  • Opus 4.7 input: cached 3.5M × ~$0.50 + uncached 1.5M × $5 = $1.75 + $7.50 = $9.25
  • Opus 4.7 output: 0.5M × $25 = $12.50
  • Daily: $21.75, monthly: ~$652
  • Same workload on Sonnet 4.6 (same caching assumptions): ~$13.05/day, ~$392/month. Savings: ~40%.

For RAG, stay on Sonnet unless quality evaluation shows a clear Opus-justified lift. Most teams overpay by defaulting to Opus here.

Workload 3: Autonomous SWE agent, 10M input / 2M output per day, no caching

  • Opus 4.7 baseline: (10M × $5) + (2M × $25) = $100/day, $3,000/month
  • Add 35% token inflation on same work: ~$135/day, ~$4,050/month
  • Batch-eligible async equivalent (50% off): ~$67.50/day, ~$2,025/month

Batch is the single biggest discount available for teams that can tolerate minutes-to-hours latency.

Prompt caching and batch processing, the levers that matter

The 35% tokenizer penalty is mostly recoverable. Cache reads are priced at roughly 10% of the standard input rate, which means a workload with long, stable system prompts or reused document context can absorb the tokenizer change and still come out ahead versus naive usage. Two patterns pay off consistently:

  • Cache your system prompt, tool definitions, and any static reference material. Anything above ~1K tokens that repeats across calls is a candidate.
  • Send long, stable conversation history as a cached prefix. If agents chain 10 turns deep, caching that prefix is the difference between an affordable product and a money pit.

Batch processing stacks the 50% discount on top of the standard rate and removes rate-limit pressure from real-time traffic. If you run nightly summarization, backfills, evaluation sweeps, red-team runs, or anything where a minutes-to-hours SLA is acceptable, route it through the Batch API.

Rate limits, availability, and migration notes

Opus 4.7 is available on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, plus the consumer Claude apps and GitHub Copilot (replacing Opus 4.5 and 4.6 in the model picker over the coming weeks). Your Opus rate limit is pooled across Opus 4.7, 4.6, 4.5, 4.1, and 4, so adding 4.7 traffic will not bypass your existing quota. Plan a gradual cutover.

Migration checklist:

  • Replay a representative traffic sample through 4.7 and measure token counts and quality side by side.
  • Recheck caching hit ratios. Tokenizer boundary changes invalidate old cache entries on the first run.
  • Confirm your observability tools report Opus 4.7 distinctly and do not roll up with other Opus versions in a way that hides cost regressions.
  • Decide which slice of traffic justifies Opus pricing at all. Most teams over-assign to Opus.

Should you upgrade from Opus 4.6 to 4.7?

Upgrade if you run autonomous coding agents, vision-heavy workflows (4.7 triples the pixel budget to 3.75 MP, meaningful for UI and document understanding), or tasks where 4.6 currently leaves quality on the table. The SWE-bench Pro jump from 53.4% to 64.3% is not cosmetic. CursorBench went from 58% to 70%. These gains show up on real production tasks, not just benchmarks.

Stay on 4.6, or move to Sonnet 4.6, if your workload is cost-sensitive, you have already tuned prompts for 4.6 density, or your evals do not show a clear quality delta that justifies the effective cost increase. The honest test: if your product’s revenue per call is under $0.50 and Opus 4.7 lifts quality by 3 percentage points, the extra cost is almost certainly worth it. If per-call revenue is much lower and the lift is marginal, it probably is not.

Frequently asked questions

Is Claude Opus 4.7 more expensive than Opus 4.6?

Per-token prices are identical. Because Opus 4.7 uses a new tokenizer, the same text can produce up to 35% more tokens, which means the effective cost per request can rise even though the rate card did not change.

How much does Claude Opus 4.7 cost per million tokens?

$5 for input and $25 for output. Prompt caching offers up to 90% savings on cache reads, and batch processing offers a 50% discount on async workloads.

Is Opus 4.7 cheaper than Sonnet 4.6?

No. Sonnet 4.6 is $3 input and $15 output, 40% cheaper per token than Opus. For most production inference, Sonnet remains the cost-effective default.

Does prompt caching still work with Opus 4.7?

Yes. Cache reads are still discounted by up to 90%, and caching is the most reliable way to offset the tokenizer change.

When will Opus 4.6 be deprecated?

Anthropic has not published a sunset date for Opus 4.6 as of the 4.7 launch. Opus rate limits are pooled across versions, so you can mix traffic during migration.

What changed in Opus 4.7 besides pricing dynamics?

SWE-bench Pro jumped from 53.4% to 64.3%, CursorBench from 58% to 70%, vision resolution tripled to 3.75 MP, and long-context retrieval improved. Context window stays at 1M tokens.

The bottom line

Anthropic kept the rate card stable on purpose, and that is a gift to anyone building a budget. But “pricing unchanged” is not the same as “cost unchanged.” The 35% tokenizer ceiling is the real pricing story for Opus 4.7. Measure it on your own traffic before you migrate, lean hard on caching and batch to claw back the difference, and be honest about which workloads actually need a frontier model at all. For many teams, the right answer after reading this is not “upgrade to 4.7,” it is “move half the traffic to Sonnet.”

Sources

Introducing Claude Opus 4.7 – Anthropic

Claude Opus 4.7 model page

Anthropic API pricing

What’s new in Claude Opus 4.7 – Claude API Docs

Introducing Claude Opus 4.6

Introducing Claude Sonnet 4.6

Introducing Claude Haiku 4.5

Main topics
vt-left-lego
vt-top-lego

One platform. Every team. Complete control.

Built for the complexity, speed, and ownership demands of modern cloud and AI environments

vt-right-lego
vt-bot-lego