· anthropic · api pricing · cost · april 2026 ·

Anthropic API pricing 2026: cost calculator

// figure Per-1M-token cost tiers — model के हिसाब से input और output

// FILED Anthropic · API Pricing // SOURCE Septim Labs // PERMALINK /blog/claude-api-pricing-calculator-2026.html cite this →

28 अप्रैल 2026 को प्रकाशित · Septim Labs · 9 मिनट का पठन

TL;DR

Anthropic API pricing per-token है — input (आप जो भेजते हैं) और output (Claude जो लौटाता है) के लिए अलग-अलग bill होती है। Prices model के हिसाब से बदलती हैं — Haiku per input token Opus से लगभग 19 गुना सस्ता है।
ज़्यादातर production workloads के लिए, Sonnet 4.5 सही default है: $3.00/M input tokens और $15.00/M output tokens, जबकि Opus 4 के लिए $15/$75। Opus तभी इस्तेमाल कीजिए जब quality का फ़र्क़ measurable हो और cost बढ़ोतरी budget में हो।
तीन pattern जो unexpected bills पैदा करते हैं: महँगे models पर लंबी context windows, ऊँचे output-to-input ratios, और sub-agent loops जो per-call cost को task count से गुणा कर देते हैं।

मौजूदा pricing — अप्रैल 2026 में verified

ये अप्रैल 2026 तक anthropic.com/api से published rates हैं। सभी prices per million tokens हैं। Anthropic pricing समय-समय पर adjust करता है; budget पर commit करने से पहले मौजूदा pricing page से verify कीजिए।

Model	Input / M tokens	Output / M tokens	Context window
Claude Haiku 3.5	$0.80	$4.00	200K
Claude Sonnet 4.5	$3.00	$15.00	200K
Claude Opus 4	$15.00	$75.00	200K

सभी models में input और output cost का ratio लगभग 1:5 है — output tokens input tokens से पाँच गुना महँगे हैं। यह asymmetry उन workloads के लिए मायने रखती है जो लंबे responses generate करते हैं — जैसे code generation या document drafting।

सभी models के लिए एक prompt caching tier भी है। Cached input tokens (वह content जो repeated requests में एक ही position पर आता है) काफ़ी कम cost पर मिलते हैं: Haiku cache reads के लिए $0.08/M, Sonnet cache reads के लिए $0.30/M, Opus cache reads के लिए $1.50/M। उन applications के लिए जो बार-बार वही system prompt या context भेजते हैं, prompt caching एक बड़ा lever है।

Cost calculator — पाँच आम workloads

ऊपर की table raw rate है। आगे जो है, वह असली workloads पर वे rates क्या मायने रखते हैं — यह दिखाता है। जब तक नोट न किया हो, सभी calculations Sonnet 4.5 पर हैं।

Workload 1: एक scoped Claude Code task

एक typical scoped task: 3 files पढ़ना (हर एक औसत 200 lines), 4 back-and-forth turns की conversation, 150 lines code का output।

Input tokens ~18,000 3 files + conversation history

Output tokens ~3,500 code + explanations

कुल cost $0.11 Sonnet 4.5

Workload 2: लंबी Claude Code session (2 घंटे)

एक codebase पर extended session: 20+ file reads, कई tasks, accumulated conversation history. कोई /compact नहीं।

Input tokens ~180,000 files + accumulated history

Output tokens ~25,000 code + analysis

कुल cost $0.92 Sonnet 4.5 · /compact मदद करता है

Workload 3: PR review automation (per PR)

Automated PR review: system prompt, ~400 lines का diff, inline comments के साथ structured review का output।

Input tokens ~8,000 system + diff

Output tokens ~1,200 review comments

कुल cost $0.04 per PR · Sonnet 4.5

Workload 4: Sub-agent parallel task (5 agents)

पाँच parallel sub-agents, हर एक का workload Workload 1 के बराबर। Context shared नहीं है — हर agent अपनी copy ले जाता है।

Per-agent cost $0.11 Workload 1 जैसा

Multiplier ×5 independent contexts

कुल cost $0.55 वही output जो sequential में $0.11

Workload 5: Workload 2 जैसा, लेकिन Opus 4 पर

दो घंटे की extended session, बिना workload बदले Sonnet 4.5 से Opus 4 पर switch।

Input tokens ~180,000 वही workload

Output tokens ~25,000 वही workload

कुल cost $4.58 Opus 4 · Sonnet cost का 5×

Septim Drills — $29 · cost calibration exercises

बारह structured exercises, जिनमें एक cost-projection drill भी है: चलाने से पहले आप workload cost का अनुमान लगाते हैं, फिर Anthropic console के सामने तुलना करते हैं। फ़र्क़ जल्दी कम होता है। Sub-agent budget worksheet और prompt-caching setup guide शामिल।

Septim Drills लें — $29 →

तीन patterns जो unexpected bills पैदा करते हैं

1. ऐसे tasks पर Opus 4 जिन्हें Sonnet बराबर handle करता है

सबसे आम ग़लती: developer अपना default model Opus 4 set कर देता है क्योंकि वह सबसे capable model है, फिर उसे ऐसे workloads पर चलाता है जहाँ Sonnet 4.5 same results देता है। Code formatting, documentation generation, test writing, और ज़्यादातर code review tasks — इनमें Opus 4 की additional capability से कोई फ़ायदा नहीं होता। $15/$75 per million tokens बनाम $3/$15 — यह same output के लिए पाँच गुना ज़्यादा cost है।

सही default: Sonnet 4.5 से शुरू कीजिए, फिर अपने specific workload पर मापिए कि Opus 4 असल में बेहतर results देता है या नहीं — payment से पहले।

2. महँगे models के साथ लंबी context windows

Opus 4 पर एक request जो 200K context window भर देती है — सिर्फ़ input tokens में $3.00 की cost है। अगर आप रोज़ ऐसी दर्जनों requests चला रहे हैं — document analysis, codebase review, बड़े refactors — तो cost तेज़ी से जुड़ जाती है। Context management guide उन techniques को cover करती है जो context को lean रखते हैं।

Repeated contexts के लिए prompt caching यहाँ काफ़ी मदद करती है: 100K-token system prompt cached और reuse होने पर Opus पर $1.50/M की cost पड़ती है, बनाम uncached पर $15/M। अगर आपकी application हर request पर वही बड़ा context भेजती है, तो caching शायद आपका सबसे बड़ा cost lever है।

3. बिना budget ceiling के sub-agent loops

Claude Code sub-agents spawn कर सकता है। जो agentic workflow एक बड़े codebase पर 10 agents को parallel चलाता है, वह आपकी single-session cost को 10 गुना कर देता है। आपकी CLAUDE.md में explicit task budget define किए बिना, यह configuration error नहीं है — यह Claude वही कर रहा है जो आपने कहा। Fix है explicit task scoping: define कीजिए कि हर agent क्या पढ़े, क्या produce करे, और कितने turns की अनुमति है।

अगर आप पहले से Tokenocalypse-style spike झेल चुके हैं, तो Septim Rescue ($299) emergency remote intervention देता है — source diagnose करने और आपके workloads पर ceiling controls implement करने के लिए।

Call चलाए बिना token estimation

English-language content के लिए 20% तक सटीक रहने वाले rough estimation rules:

1,000 शब्दों का English prose ≈ 1,300–1,500 tokens
TypeScript की 100 lines ≈ 800–1,000 tokens
Python की 100 lines ≈ 700–900 tokens
200-line का diff ≈ 1,600–2,000 tokens
सटीक count के लिए Anthropic tokenizer console.anthropic.com/tokenizer पर है

Production applications के लिए, हर API response के usage field से actual token consumption track कीजिए। पहले दिन से log कीजिए — aggregated logs से cost history reconstruct करना, real time में collect करने से कहीं ज़्यादा मुश्किल है।

Claude Code बनाम direct API: कौन ज़्यादा महँगा

Claude Code (CLI tool) वही underlying API इस्तेमाल करता है, लेकिन overhead जोड़ता है: system prompt, tool descriptions, और conversation management layer — सब tokens consume करते हैं जो direct API calls में आप नहीं देते। Practice में, एक Claude Code session per unit useful output लगभग 15–25% ज़्यादा cost करती है, बनाम उसी task के लिए optimized direct API call की।

वह overhead agentic loop की क़ीमत है — iterate करने, files पढ़ने, commands चलाने, और course-correct करने की क्षमता। Structured, predictable API calls (classification, extraction, ज्ञात format में generation) के लिए direct API सस्ता है। Open-ended development tasks के लिए, Claude Code की overhead worth it है।

Septim Drills — $29 · 12 cost-aware exercises

Model selection, context compression, prompt caching setup, और sub-agent budgeting को cover करने वाले structured exercises। हर exercise में एक cost target और एक verification step है। एक बार payment, purchase पर GitHub repo invite।

Septim Drills ख़रीदें — $29 → Cost पहले से क़ाबू से बाहर है? Septim Rescue ($299) →