Anthropic released Claude Sonnet 5 today. The framing is friendly: it's the most agentic Sonnet yet, it gets close to Opus 4.8 quality, and it launches at an introductory price below what the previous Sonnet cost. Anthropic even calls the move to Sonnet 5 "roughly cost-neutral."
That word — neutral — is doing a lot of work, and it has an expiration date. Sonnet 5 is a real price-performance step forward. It also carries two cost mechanics that won't show up in a quick read of the launch post: a price step-up on September 1 and a new tokenizer that can turn the same text into up to 35% more billable tokens. Together, a workload that looks cheaper this week can land above your old baseline in nine weeks — with a sticker price that still reads "unchanged." Here's the good, the bad, and what to do before August 31.
Two numbers matter most. The standard price ($3/$15) is identical to what Sonnet 4.6 cost — so the rate card looks flat. And the jump from intro to standard is a clean +50% on every token ($2→$3, $10→$15). The tokenizer change rides on top of both.
Tracking Claude spend across teams? Finout's Anthropic integration pulls token-level cost into your MegaBill alongside AWS, GCP, and Kubernetes — so the August 31 step-up shows up as a line you can see, not a surprise at month-end.
Sonnet 5 performs close to Opus 4.8 on agentic work — planning, tool use, coding, knowledge tasks — at a fraction of the price. Opus 4.8 is $5/$25. Sonnet 5 is 40% cheaper on both at standard rates, and 60% cheaper during the intro window. For most teams the biggest lever in this release isn't adopting Sonnet 5 for new work — it's demoting existing Opus traffic to Sonnet 5.
Worked example — an Opus 4.8 agent (5M input / 500K output per day):
The catch: those savings are only real if quality holds on your tasks. Replay a representative traffic sample, hold output quality constant, and move only the slice that passes — then bank the rest as a deadline-bound discount by front-loading batch jobs, eval sweeps, and backfills before August 31.
Anthropic set the intro price low enough that even with the extra tokens from the new tokenizer, moving from Sonnet 4.6 to Sonnet 5 is roughly cost-neutral today. That's true — and temporary. Watch one unchanged workload (the same source text that ran 5M input / 500K output per day on Sonnet 4.6):
|
Stage |
Effective monthly cost |
vs. Sonnet 4.6 baseline |
|---|---|---|
|
Sonnet 4.6 (baseline) |
~$675 |
— |
|
Sonnet 5, intro price, +20% tokens |
~$540 |
−20% (looks like a win) |
|
Sonnet 5, intro price, +35% tokens |
~$608 |
−10% |
|
Sonnet 5, standard price, +20% tokens |
~$810 |
+20% |
|
Sonnet 5, standard price, +35% tokens |
~$911 |
+35% |
The trap is the shape of the curve. Migrate in July and your Claude bill drops — finance sees a win, nobody flags it. Then on September 1 the per-token rate rises 50% on the exact same traffic, and because the tokenizer is already inflating your token count, effective cost lands 20–35% above where you started — while the rate card still says $3/$15, the same as Sonnet 4.6. "Pricing unchanged" is not "cost unchanged," and "cost-neutral" had a date on it. Replay real traffic through claude-sonnet-5, measure the token delta on your own content mix, and budget at September pricing, not July's.
The tokenizer is the visible mechanic; the quieter one is behavioral. Sonnet 5 is built to run autonomously — it plans, drives browsers and terminals, checks its own output without being asked, and pushes through tasks where older models stopped short. That's why it's good, and also why per-task token consumption becomes variable and harder to forecast. Anthropic raised rate limits across Chat, Cowork, Claude Code, and the Platform specifically to absorb the higher token usage at higher effort levels — higher usage is designed in, not an edge case.
The effort dial cuts both ways: run an easy task at xhigh and you pay for reasoning it didn't need; run a hard one too low and you pay twice when it retries. The cost of an agent run is now effort × autonomy × tokenizer density — three moving parts, none on the rate card. The discipline this demands isn't "use Sonnet 5 less," it's measuring cost per completed task, per agent and per feature. A model that finishes in fewer steps can be cheaper overall even at a higher per-token rate — but only the unit economics tell you which way it nets.
|
Model |
Input ($/1M) |
Output ($/1M) |
Best for |
|---|---|---|---|
|
Claude Opus 4.8 |
$5 |
$25 |
Highest-accuracy agentic work, hardest coding |
|
Claude Sonnet 5 (standard) |
$3 |
$15 |
Near-frontier agentic work at production scale |
|
Claude Sonnet 5 (intro, →Aug 31) |
$2 |
$10 |
Same, at a temporary discount — front-load batch/eval here |
|
Claude Sonnet 4.6 |
$3 |
$15 |
Predecessor; same sticker, older (denser-billing) tokenizer |
|
Claude Haiku 4.5 |
$1 |
$5 |
High-volume, low-latency, simpler tasks |
Use the effort dial and Opus 4.8 as a single cost-performance range: Sonnet 5 (or low-effort Sonnet 5) for the bulk of agentic work, Opus 4.8 reserved for the slice where accuracy directly drives revenue, Haiku for extraction and routing at volume.
A tokenizer shift and a dated price step-up are exactly the kind of changes that hide inside a "flat" rate card and surface on a future invoice. Finout's Anthropic integration pulls Claude spend into MegaBill alongside AWS, GCP, Azure, Kubernetes, Snowflake, and Databricks, so you get:
For a lot of teams the right move after this release is shift Opus traffic down to Sonnet 5 — that's where the 40–60% savings live. But take the "cost-neutral" framing at face value and you'll budget July's number for a September reality. The intro price is a deadline, the new tokenizer is a multiplier, and an agentic model's token burn is a variable, not a constant. The question was never whether the bill will move — it's whether you'll know why, per request and per team, before the invoice lands.