· anthropic · api pricing · cost · april 2026 ·

Anthropic API pricing 2026: cost calculator

// figure Per-1M-token cost tiers — input and output, by model

// FILED Anthropic · API 价格 // SOURCE Septim Labs // PERMALINK /blog/claude-api-pricing-calculator-2026.html cite this →

Published April 28, 2026 · by Septim Labs · 9 min read

TL;DR

Anthropic API pricing is per-token, billed separately for input (what you send) and output (what Claude returns). 价格s vary by model — Haiku is ~19x cheaper per input token than Opus.
For most production workloads, Sonnet 4.5 is the right default: it costs $3.00/M input tokens and $15.00/M output tokens, versus $15/$75 for Opus 4. Use Opus only when the quality difference is measurable and the cost increase is in the budget.
The three patterns that cause unexpected bills: long context windows on expensive models, high output-to-input ratios, and sub-agent loops that multiply per-call cost by task count.

Current pricing — verified April 2026

These are the published rates from anthropic.com/api as of April 2026. 全部 prices are per million tokens. Anthropic adjusts pricing periodically; verify against the current pricing page before committing to a budget.

Model	Input / M tokens	Output / M tokens	Context window
Claude Haiku 3.5	$0.80	$4.00	200K
Claude Sonnet 4.5	$3.00	$15.00	200K
Claude Opus 4	$15.00	$75.00	200K

The ratio between input and output cost is approximately 1:5 across all models — output tokens are five times more expensive than input tokens. This asymmetry matters for workloads that generate long responses, like code generation or document drafting.

There is also a prompt caching tier for all models. Cached input tokens (content that appears at the same position in repeated requests) cost significantly less: $0.08/M for Haiku cache reads, $0.30/M for Sonnet cache reads, $1.50/M for Opus cache reads. Prompt caching is a meaningful lever for applications that send the same system prompt or context repeatedly.

Cost calculator — five common workloads

The table above is the raw rate. What follows is what those rates actually mean for real workloads. 全部 calculations use Sonnet 4.5 unless otherwise noted.

Workload 1: Single Claude Code task (scoped)

A typical scoped task: read 3 files (avg 200 lines each), conversation with 4 back-and-forth turns, output 150 lines of code.

Input tokens ~18,000 3 files + conversation history

Output tokens ~3,500 code + explanations

Total cost $0.11 Sonnet 4.5

Workload 2: Long Claude Code session (2 hours)

An extended session working across a codebase: 20+ file reads, multiple tasks, accumulated conversation history. No /compact.

Input tokens ~180,000 files + accumulated history

Output tokens ~25,000 code + analysis

Total cost $0.92 Sonnet 4.5 · /compact helps

Workload 3: PR review automation (per PR)

Automated PR review: system prompt, diff of ~400 lines, output a structured review with inline comments.

Input tokens ~8,000 system + diff

Output tokens ~1,200 review comments

Total cost $0.04 per PR · Sonnet 4.5

Workload 4: Sub-agent parallel task (5 agents)

Five parallel sub-agents, each with workload equivalent to Workload 1. Context is not shared — each agent carries its own copy.

Per-agent cost $0.11 same as Workload 1

Multiplier ×5 independent contexts

Total cost $0.55 for same output as $0.11 sequential

Workload 5: Same as Workload 2, but on Opus 4

The two-hour extended session, switching from Sonnet 4.5 to Opus 4 without changing the workload.

Input tokens ~180,000 identical workload

Output tokens ~25,000 identical workload

Total cost $4.58 Opus 4 · 5× Sonnet cost

Septim Drills — $29 · cost calibration exercises

Twelve structured exercises including a cost-projection drill: you estimate workload cost before running it, then compare against the Anthropic console. The delta closes fast. Includes the sub-agent budget worksheet and the prompt-caching setup guide.

Get Septim Drills — $29 →

The three patterns that cause unexpected bills

1. Opus 4 on tasks that Sonnet handles equivalently

The most common mistake: a developer sets their default model to Opus 4 because it is the most capable model, then runs it on workloads where Sonnet 4.5 produces identical results. Code formatting, documentation generation, test writing, and most code review tasks do not benefit from Opus 4's additional capability. At $15/$75 per million tokens versus $3/$15, this costs five times as much for the same output.

The correct default: start with Sonnet 4.5 and measure whether Opus 4 produces meaningfully better results on your specific workload before paying for it.

2. Long context windows with expensive models

A single request to Opus 4 that fills the 200K context window costs $3.00 in input tokens alone. If you are running dozens of these requests daily — document analysis, codebase review, large refactors — the cost compounds quickly. The context management guide covers the techniques for keeping context lean.

Prompt caching helps significantly here for repeated contexts: a 100K-token system prompt cached and reused costs $1.50/M on Opus versus $15/M uncached. If your application sends the same large context on every request, caching is likely your highest-leverage cost lever.

3. Sub-agent loops without a budget ceiling

Claude Code can spawn sub-agents. An agentic workflow that spawns 10 agents to work in parallel on a large codebase multiplies your single-session cost by 10. Without an explicit task budget defined in your CLAUDE.md, this is not a configuration error — it is Claude doing what you asked. The fix is explicit task scoping: define what each agent should read, what it should produce, and how many turns it is allowed.

If you have already had a Tokenocalypse-style spike, Septim Rescue ($299) covers emergency remote intervention to diagnose the source and implement ceiling controls on your workloads.

Token estimation without running the call

Rough estimation rules that hold to within 20% for English-language content:

1,000 words of English prose ≈ 1,300–1,500 tokens
100 lines of TypeScript ≈ 800–1,000 tokens
100 lines of Python ≈ 700–900 tokens
A 200-line diff ≈ 1,600–2,000 tokens
The Anthropic tokenizer is available at console.anthropic.com/tokenizer for exact counts

For production applications, use the usage field in every API response to track actual token consumption. Log it from day one — reconstructing cost history from aggregated logs is much harder than collecting it in real time.

Claude Code vs. direct API: which costs more

Claude Code (the CLI tool) uses the same underlying API but adds overhead: the system prompt, tool descriptions, and the conversation management layer all consume tokens you do not pay for when making direct API calls. In practice, a Claude Code session costs roughly 15–25% more per unit of useful output than an equivalent direct API call optimized for the same task.

That overhead is the price of the agentic loop — the ability to iterate, read files, run commands, and course-correct. For structured, predictable API calls (classification, extraction, generation of a known format), the direct API is cheaper. For open-ended development tasks, Claude Code's overhead is worth it.

Septim Drills — $29 · 12 cost-aware exercises

Structured exercises covering model selection, context compression, prompt caching setup, and sub-agent budgeting. Each exercise includes a cost target and a verification step. 买断制, GitHub repo invite on purchase.

购买 Septim Drills — $29 → Cost already out of control? Septim Rescue ($299) →