Anthropic Claude is a family of large language models (LLMs) designed for natural language understanding, reasoning, coding, and content generation. Available as a consumer chat app (claude.ai), a developer API, and an enterprise platform, Claude's model tiers — Haiku (fastest), Sonnet (balanced), and Opus (most capable) — each carry distinct Claude pricing that reflects their capability level, with multiple generations available across each tier.
As of April 2026, the current recommended models are Claude Opus 4.6, Claude Sonnet 4.6, and Claude Haiku 4.5. Claude stands out through strong benchmark performance on reasoning and coding, Anthropic's Constitutional AI safety approach, and a context window of up to 1 million tokens — making it especially well-suited for document-heavy and agentic workloads where Claude pricing scales with the complexity of the task.
Read Finout's CPO Article about the newest Claude pricing changes here
See how to show Claude ROI in practice — explore FinoutAnthropic offers individual Claude pricing across three tiers on claude.ai (all prices in USD). The Free plan requires no credit card and covers web, iOS, Android, and desktop access with text, image, and code generation, web search, and desktop extensions — subject to daily usage limits.
Pro runs $20/month [annual rate TBC] and adds Claude Code in the terminal, file creation and code execution, unlimited projects, Google Workspace integration, remote MCP connectors, and extended reasoning models — the right tier for developers and power users. Max starts at $100/month for 5x more usage than Pro, or $200/month for 20x, and adds priority access to new features and models
Anthropic's organizational plans add admin controls, collaboration features, and enterprise-grade security on top of the individual plan capabilities. All plans require a minimum of five members.
| Plan | Price | Key Features |
| Team Standard | $25/user/mo (annual) $30/mo billed monthly |
SSO, domain capture, centralized billing, Microsoft 365 & Slack integrations, admin controls, org-wide search, enterprise desktop deployment |
| Team Premium | $150/user/mo | All Standard features plus Claude Code access and early access to new collaboration features — suited to technical teams |
| Enterprise | Custom pricing | All Team features plus: expanded context window, role-based access control, SCIM, audit logging, compliance API, custom data retention, Google Docs catalog, Claude Code for premium users |
The Team Standard seat covers most organizational collaboration needs. Team Premium adds Claude Code, making it the right choice for engineering teams building with or on top of Claude. Enterprise is tailored for organizations with governance, compliance, or data residency requirements — pricing is available on request from Anthropic's sales team.
Claude API pricing is based on token consumption — charged separately for input tokens (your prompts and context) and output tokens (Claude's responses). All prices are per million tokens (MTok) in USD.
|
Model |
Input (≤200K) |
Output (≤200K) |
Input (>200K) |
Output (>200K) |
Cache Write |
Cache Read |
|
Claude Opus 4.6 |
$5 |
$25 |
$10 |
$37.50 |
$6.25 |
$0.50 |
|
Claude Sonnet 4.6 |
$3 |
$15 |
$6 |
$22.50 |
$3.75 |
$0.30 |
|
Claude Sonnet 4.5 |
$3 |
$15 |
$6 |
$22.50 |
$3.75 |
$0.30 |
|
Claude Haiku 4.5 |
$1 |
$5 |
$2 |
$7.50 |
$1.25 |
$0.10 |
|
Model |
Input |
Output |
Cache Write |
Cache Read |
|
Claude Opus 4.5 |
$5 |
$25 |
$6.25 |
$0.50 |
|
Claude Opus 4.1 |
$15 |
$75 |
$18.75 |
$1.50 |
|
Claude Opus 4 |
$15 |
$75 |
$18.75 |
$1.50 |
|
Claude Sonnet 4 |
$3 | $15 | $3.75 | $0.30 |
|
Claude Sonnet 3.7 |
$3 | $15 | $3.75 | $0.30 |
|
Claude Haiku 3.5 |
$0.80 | $4 | $1 | $0.08 |
|
Claude Haiku 3 |
$0.25 |
$1.25 |
$0.30 |
$0.03 |
|
Tool |
Pricing |
Notes |
|
Web Search |
$10 per 1,000 searches |
Server-side tool, charged per search regardless of token usage |
|
Code Execution |
$0.05 per container-hour |
50 free hours per org per day; billed after that |
|
Opus 4.6 Fast Mode |
6x standard rates |
Beta: significantly faster output for latency-sensitive workloads |
|
US-Only Inference |
1.1x multiplier on all tokens |
Applies to Opus 4.6+ via inference_geo parameter; global routing is standard price |
Two features offer the most significant cost reductions for production API usage: the Batch API and prompt caching. Used together, they can reduce effective API spend by up to 95% on eligible workloads.
Two features offer the most significant cost reductions for production API usage: the Batch API and prompt caching. The Batch API delivers a flat 50% discount on all token costs for asynchronous workloads. Prompt caching reduces repeated input costs by up to 90%. Used together on eligible workloads, the combined savings can reach 95% compared to standard on-demand pricing.
Batch API: 50% off asynchronous workloads
Anthropic's Batch API processes requests asynchronously within a 24-hour window in exchange for a flat 50% discount on all input and output tokens. This applies to every Claude model without exception. It's ideal for content generation, data classification, document analysis, and any workload where real-time responses aren't required. The trade-off is simple: if your task can wait, you pay half.
Prompt caching: up to 90% off repeated context
Prompt caching stores previously processed portions of a prompt — a system prompt, a large document, or conversation history — so subsequent requests can read from cache rather than reprocess the same tokens. Cache reads are charged at roughly 10% of the standard input rate. For applications that reuse the same large context across many requests, this is the most impactful single optimization available.
Anthropic supports two caching modes: automatic caching (a single cache_control field at the request level) and explicit cache breakpoints for fine-grained control. Automatic caching is the recommended starting point for most use cases.
Extended thinking tokens
Extended thinking — available on Opus 4.6, Sonnet 4.6, and several earlier models — lets Claude perform internal reasoning before generating a final response. This improves quality on complex tasks but generates additional tokens. Extended thinking tokens are billed as standard output tokens at the model's normal rate, not as a separate pricing tier. Set a thinking token budget appropriate to the task complexity and monitor actual usage to avoid unexpected cost increases.
A startup integrates Claude Sonnet 4.6 into a support chatbot. Monthly usage: 5 million input tokens, 2 million output tokens, with prompt caching (1M cache write, 3M cache reads).
Input: 5 × $3 = $15
Output: 2 × $15 = $30
Cache write: 1 × $3.75 = $3.75
Cache read: 3 × $0.30 = $0.90
Total: $49.65/month
A large enterprise migrates from Opus 4.1 to Opus 4.6 for its internal knowledge assistant. Monthly usage: 10 million input tokens, 4 million output tokens, with caching (2M write, 5M read).
Before (Opus 4.1):
Input: 10 × $15 = $150
Output: 4 × $75 = $300
Caching: $37.50 + $7.50 = $45
Total: $495/month
After (Opus 4.6):
Input: 10 × $5 = $50
Output: 4 × $25 = $100
Caching: $12.50 + $2.50 = $15
Total: $165/month — saving $330/month (67%)
A content agency runs SEO content generation using Haiku 4.5 with the Batch API. Monthly usage: 20 million input tokens, 10 million output tokens.
Standard cost: (20 × $1) + (10 × $5) = $70
With 50% Batch API discount: $35/month
A research team regularly processes documents exceeding 200K tokens using Sonnet 4.6. Monthly usage: 8 million input tokens (all >200K threshold), 3 million output tokens, with caching (2M write, 4M read).
Input: 8 × $6 = $48
Output: 3 × $22.50 = $67.50
Cache write: 2 × $7.50 = $15
Cache read: 4 × $0.60 = $2.40
Total: $132.90/month
A team uses Claude's code execution tool for test automation alongside Sonnet 4.6 for 1 million input tokens and 500K output tokens. Monthly container usage: 1,500 paid hours (after 50 free hours/day).
Tokens: (1 × $3) + (0.5 × $15) = $10.50
Code execution: 1,500 × $0.05 = $75
Total: $85.50/month
Audit and migrate legacy model usage
The highest-impact single action for most organizations in 2026 is identifying any remaining Opus 4 or Opus 4.1 usage and migrating to Opus 4.6. The 67% price reduction from $15/$75 to $5/$25 per million tokens is dramatic, and the newer model is broadly more capable. Similarly, review whether workloads currently on Sonnet or Opus actually require that tier — many can be served by Haiku 4.5 at a fifth of the cost.
Implement model routing by task complexity
Route tasks to the cheapest model that meets the quality bar. A common pattern is Haiku 4.5 for classification, triage, and simple generation; Sonnet 4.6 for most production workloads; and Opus 4.6 only for tasks requiring maximum reasoning depth. A 70/20/10 split (Haiku/Sonnet/Opus) instead of all-Sonnet can cut total API costs by more than half on typical workloads.
Use the Batch API for non-real-time workloads
Any task that doesn't require an immediate response — document processing, content generation, data classification, batch analysis — is a candidate for the Batch API's 50% discount. Structuring your pipeline to queue these workloads asynchronously is a straightforward architectural change with immediate cost impact.
Enable prompt caching for repeated context
If your application sends the same system prompt, document, or conversation history with each request, prompt caching is the most effective optimization available. Cache reads cost roughly 90% less than standard input tokens. Start with automatic caching by adding a cache_control field to your request, then tune from there.
Monitor token usage per model, team, and application
Token-level visibility is the prerequisite for all optimization decisions. Track per-model, per-application, and per-team usage in real time — not just aggregate monthly spend. This visibility surfaces anomalies early, identifies which teams or applications are driving cost growth, and creates the accountability loop that keeps AI spend manageable as usage scales.
Set explicit output token limits
Output tokens are typically 3–5x more expensive than input tokens. Setting appropriate max_tokens limits on each request prevents runaway output generation and keeps responses focused. Audit high-traffic prompt templates for verbosity — unnecessarily long outputs inflate cost without improving outcomes.
For individual developers and small teams, monitoring Claude costs through Anthropic's console is sufficient. But as Claude usage scales across engineering teams, products, and use cases, the console quickly becomes insufficient — it shows total spend, not who is spending what, on which model, for which product or customer.
Finout's AI Cost Management ingests Claude and Anthropic API billing data alongside AWS, GCP, Azure, Kubernetes, and SaaS spend into a single MegaBill allocation layer. This means token-level costs from Claude can be attributed to specific teams, products, or customers using the same Virtual Tag allocation logic used for the rest of your infrastructure — without maintaining a separate reporting system for AI spend.
Practically, this enables FinOps and engineering teams to answer the questions that matter: which team's Claude usage spiked this week? Which product feature is driving the most Opus 4.6 spend? What is our cost per inference for each AI-powered product line? And are our optimization efforts — model routing, prompt caching, batch processing — actually reducing cost per unit of value over time?