Table of Contents

Anthropic released Claude Fable 5 today — a publicly available version of its Mythos-class model, previously restricted to government cybersecurity partners. The API price is $10 per million input tokens and $50 per million output tokens, which is 2x the cost of Claude Opus 4.8 and 2x GPT-5.5's input price on sticker.

Whether that premium is justified depends on your task type. This post covers the pricing structure, what the benchmark data actually says, and a model selection framework for teams already paying for Claude or evaluating the switch from OpenAI.


Fable 5 vs. Mythos 5: What's the Difference

They are the same underlying model. The distinction is access control.

Claude Mythos 5 has no safety classifiers on cybersecurity and biology queries. It's restricted to approved Glasswing partners (US government cyber defenders) and, soon, select biomedical research organizations.

Claude Fable 5 has safety classifiers layered on top. When a query matches the cybersecurity, biology/chemistry, or distillation classifier, the request is automatically routed to Claude Opus 4.8 instead. Anthropic notifies users when this happens. Based on early data, it triggers in fewer than 5% of sessions.

Both models are priced identically. The fallback to Opus 4.8 is billed at Opus 4.8 rates.


Claude Fable 5 Pricing vs. Competitors

Model Input ($/M tokens) Output ($/M tokens)
Claude Fable 5 / Mythos 5 $10 $50
Claude Opus 4.8 (standard) $5 $25
Claude Opus 4.8 (fast mode) $10 $50
Claude Sonnet 4.6 ~$3 ~$15
GPT-5.5 (standard) $5 $30
GPT-5.5 (batch/flex) $2.50 $15

What's good about the pricing

It's less than half the price of Claude Mythos Preview, which ran at approximately $30/$150 per million tokens. That predecessor was available only to restricted partners, so the comparison is academic for most teams, but it's directionally significant — Anthropic is lowering the cost of frontier-tier inference as capacity scales.

Token efficiency partially offsets the sticker gap vs GPT-5.5. One early customer found Fable 5 completed a frontier physics research task in 36 hours using one-third the reasoning tokens it took GPT-5.5 four days to match. At $10/M input vs $5/M input, if you're using 3x fewer tokens, Fable 5's effective cost on that class of task is lower. This efficiency advantage applies to complex multi-step reasoning, not to simple or short-context queries.

Subscription users have a free eval window until June 22. Pro, Max, Team, and seat-based Enterprise plans include Fable 5 at no extra cost through June 22. After that it moves to usage credits, with no committed date for re-inclusion in flat plans.

What's bad about the pricing

The 2x premium over Opus 4.8 is hard to justify for most tasks. For high-volume, well-defined workflows — classification, summarization, structured extraction, RAG retrieval — Opus 4.8 at $5/$25 or Sonnet 4.6 at ~$3/$15 will produce equivalent output. The price difference on 10 million output tokens is $250,000/year vs $125,000/year. That's not a rounding error.

The fallback classifier creates cost unpredictability. When a query hits the cybersecurity or bio/chemistry classifier, it reroutes to Opus 4.8 and bills at Opus 4.8 rates. For most teams the 5% rate is negligible. For bio, chem, or security-adjacent workflows, the classifier is deliberately broad and the effective fallback rate will be higher — Anthropic acknowledged this explicitly and has committed to narrowing it.

The subscription cut-off on June 23 has no firm timeline for reversal. Anthropic's stated intent is to restore Fable 5 as a standard plan feature "when sufficient capacity allows." That's an open-ended commitment. Individual power users on Pro or Max plans should factor this in before building workflows that depend on flat-rate access.

There is no batch or async pricing tier equivalent to GPT-5.5's $2.50/$15 flex pricing. For offline processing at scale, OpenAI retains a meaningful cost advantage.


Benchmark Data

Anthropic ran most of these evaluations, and the early-access companies ran their own. Independent third-party benchmarks are still limited given the same-day launch. The numbers below are from vendor-published results and stated customer evaluations — treat them accordingly.

Benchmark Claude Fable 5 Claude Opus 4.8 GPT-5.5
SWE-Bench Pro (real GitHub issue resolution) 80.3% 69.2% 58.6%
Hex Analytics Benchmark (long-form complex tasks) 90%+ ~80%
Hebbia Finance Benchmark (senior analyst tasks) #1
FrontierCode (prod-quality code, token efficiency) #1 among frontier
Frontier physics research (token efficiency) 3x fewer tokens vs GPT-5.5 baseline baseline (4 days)

The SWE-Bench Pro numbers are the most independently meaningful because the benchmark methodology is public and has been applied to multiple models. The 22-point gap between Fable 5 (80.3%) and GPT-5.5 (58.6%) is substantial, and Opus 4.8 (69.2%) sits between them — meaning Fable 5 represents a real step up from Opus, not just a marginal one.

Victor Taelin (open-source researcher) publicly questioned whether the pre-launch leaked benchmarks were designed to flatter. His skepticism is worth noting, and independent evals will matter more as they emerge. The customer evaluations from Stripe, Cursor, GitHub, Hex, and IMC are harder to dismiss because these are paying production customers, not Anthropic partners — but they're still testimonials, not audited results.


Key Capabilities vs Previous Models

Software engineering

Stripe reported Fable 5 completed a codebase-wide migration across a 50-million-line Ruby codebase in one day, estimated at two months of manual team work. Cursor reports it opened "a class of long-horizon problems that were out of reach for earlier models." Hex reported a 10-point benchmark jump over Opus 4.8 on complex analytical tasks, with Fable 5 breaking 90% on their core benchmark for the first time.

Vision

Fable 5 completed Pokémon FireRed from a minimal vision-only harness (raw game screenshots, no maps or state information). Earlier Claude models required additional scaffolding to make any progress on the same task. It can also reconstruct web app source code from screenshots and extract precise values from scientific figures.

Long-context and memory retention

In a controlled evaluation using the deck-building game Slay the Spire, giving Fable 5 access to persistent file-based memory improved its performance three times more than the same setup improved Opus 4.8. Fable 5 also reached the game's final act three times more often. For long-running agentic workflows where the model builds on prior state, the performance delta over Opus 4.8 grows over time.

Life sciences (Mythos 5 only)

Mythos 5 matched or exceeded skilled human operators on protein design tasks across 9 of 14 targets, completing work that normally requires a scientist to choose binding sites, run design tools, and handle failures autonomously. In genomics, Mythos 5 ran over a week of largely autonomous research, training a custom ML model that outperformed a recent Science-published model while being 100x smaller. These capabilities are restricted to Mythos 5 under the trusted access program — Fable 5's bio/chemistry classifier will intercept many such queries.


The Safeguard System

The classifier fallback is the most operationally significant difference between Fable 5 and every other generally available model, and it's worth understanding concretely.

What triggers it: Cybersecurity queries (exploit development, offensive operations, vulnerability discovery), biology and chemistry queries (broadly defined, deliberately conservative), and requests Anthropic's classifiers flag as large-scale distillation attempts.

What happens: The request is served by Claude Opus 4.8 instead of Fable 5. The user is notified. The response is billed at Opus 4.8 rates.

Robustness: External red-teaming across 1,000+ hours found no universal jailbreaks. The UK AISI made early progress toward one in a short initial window — Anthropic disclosed this. A universal jailbreak that bypasses classifiers across all session types has not been demonstrated publicly.

Known false-positive problem: Anthropic explicitly stated the bio/chemistry classifier is tuned conservatively and will catch benign requests. Medicinal chemistry, genomics, and even some general biology questions may hit it. Anthropic has committed to narrowing this, with no specific timeline.

Data retention change: All Mythos-class traffic now has a mandatory 30-day retention period. Anthropic states it won't use this for training and will delete it after 30 days. Human access to retained data is logged. If your organization has data residency or retention constraints, this is a compliance consideration that doesn't exist for GPT-5.5.


Fable 5 vs. GPT-5.5: Where Each Wins

Fable 5 advantages:

  • SWE-Bench Pro: 80.3% vs 58.6% — a 22-point lead on real-world software engineering
  • Token efficiency on complex reasoning tasks (demonstrated in physics research eval)
  • Long-context memory retention for agentic workflows
  • Financial and analytical benchmarks (Hebbia, IMC evaluations)
  • Vision tasks requiring precise extraction or code reconstruction from screenshots

GPT-5.5 advantages:

  • 2x cheaper input ($5/M vs $10/M), 40% cheaper output ($30/M vs $50/M)
  • Batch/flex tier available at $2.50/$15 per million tokens — no Anthropic equivalent
  • No safety classifier fallbacks or topology of restricted queries
  • No mandatory data retention policy
  • Simpler cost modeling for teams running predictable volume

Where the gap closes on cost: If your workload is complex enough that Fable 5 uses significantly fewer tokens to reach the same output quality, the effective cost gap narrows or inverts. This is documented for long-horizon reasoning tasks. It does not apply to high-volume, short-context, or well-defined tasks.


Which Claude Model to Use

Use case Recommended model Why
Long agentic coding runs, codebase migrations Fable 5 22-point SWE-Bench Pro lead, compounding advantage on multi-step tasks
Complex financial/analytical research Fable 5 Documented wins at Hebbia, IMC; senior analyst-grade reasoning
Multi-modal tasks requiring vision Fable 5 Stronger vision extraction and reconstruction than prior Claude models
General-purpose coding, Q&A, document tasks Opus 4.8 Half the price, still ahead of GPT-5.5 on SWE-Bench Pro (69.2% vs 58.6%)
High-volume classification, summarization, RAG Sonnet 4.6 ~$3/$15 per million tokens; marginal capability loss on well-defined tasks
Bio, chem, or cybersecurity workflows Opus 4.8 (for now) Fable 5 classifier fallback will intercept many domain-relevant queries
Large-scale offline/batch processing GPT-5.5 flex $2.50/$15 with no Anthropic equivalent tier

The principle: use the cheapest model that reliably clears your quality bar. Fable 5's premium is justified when task complexity is high, context is long, or the work compounds over multiple steps. For everything else, Opus 4.8 or Sonnet 4.6 are the better economics.


Availability

  • API (claude-fable-5): Available today on the Claude API and consumption-based Enterprise plans.
  • Subscription plans (Pro, Max, Team, seat-based Enterprise): Included at no extra cost through June 22, 2026. Usage credits required after June 23. Anthropic's stated intent is to restore it as a standard plan feature when capacity allows — no committed date.
  • Claude Mythos 5: Restricted to existing Glasswing partners today. Trusted access expansion planned for biology researchers and, eventually, a broader cybersecurity program.

Summary

Claude Fable 5 is priced at $10/$50 per million tokens — double Opus 4.8, double GPT-5.5 on input. The benchmark evidence, particularly on software engineering (SWE-Bench Pro: 80.3% vs GPT-5.5's 58.6%) and token efficiency on complex tasks, supports a real performance advantage for the right workload type. For high-volume or well-defined tasks, the premium is not defensible against Opus 4.8 or Sonnet 4.6.

The two-week free window on subscription plans is the right time to run your own evals before committing to the pricing.


Sources: