· billing anomaly · april 2026 ·

Claude Code invisible رمز burn (April 2026): what's happening and how to detect it

// figure Token counter with invisible inflation

// FILED التكلفة & Ops// SOURCE Septim Labs// PERMALINK /المدوّنة/claude-code-invisible-رمز-burn-april-2026.htmlcite this →

Published April 20, 2026 · by Septim Labs · 7 min read

TL;DR

Multiple users on r/ClaudeAI and the Anthropic commوحدةy forum describe roughly 20,000 extra رمزs per request appearing in their billing logs that they never sent.
This inflates session التكلفةs without any visible change in the Claude Code interface — your /context readout and your actual bill tell different stories.
Four runnable detection patterns are below. If your API-side رمز counts are diverging from local ccusage estimates, you are likely affected.

1. What's happening

Starting in early April 2026, a cluster of developers began noticing that their Anthropic API spend was climbing faster than their actual work could explain. The divergence wasn't random — it was consistent, roughly proportional to request volume, and invisible in the Claude Code UI.

Coverage from Efficienist (April 2026) described the pattern as server-side رمز inflation: requests were arriving at Anthropic's API carrying a significantly larger رمز footprint than the client-side session data accounted for. The publication noted that affected users could reproduce the gap reliably by comparing their local ccusage الإجماليs against their Anthropic dashboard in the same 15-minute window.

DevClass (April 1, 2026) reported that developers were observing "consistent billing anomalies tied to request counts rather than context length," suggesting the inflation was applied per-call rather than scaling with actual prompt size. This per-call characteristic is what makes it particularly expensive for وكيلic سير العملs with high tool-call frequency.

The sharpest user signal came from r/ClaudeAI, where one thread accumulated significant upvotes around a single observation:

"Every request appears to carry around 20,000 extra رمزs that the user never sent."

— r/ClaudeAI, April 2026

Separately, threads on the Anthropic commوحدةy forum documented users correlating timestamp-specific rate changes: التكلفةs that spiked not when context grew, but after a specific date in early April, holding roughly flat at the elevated rate regardless of prompt complexity. Multiple users in these threads noted the issue appeared after an Anthropic server-side تحديث and was not reproducible on API clients that bypassed the Claude Code CLI layer.

To be direct about what we do and don't know: Anthropic has not صدر a public statement confirming this as an infrastructure bug. What we have is a consistent pattern across independent user reports. The mechanism — whether it's system prompt inflation, caching metadata, or something in the tool-use scaffolding — is not yet publicly documented. We're reporting what's observable, not what's confirmed.

2. How to detect it in your own logs

Four runnable patterns. Each takes under خمس دقائق. Together they tell you whether you're affected and, if so, the approximate magnitude.

Pattern 1: ccusage diff against the Anthropic dashboard

التثبيت ccusage if you haven't already (npm i -g ccusage). Run a short Claude Code session — a single file edit, a question, anything with at least five tool calls. Then immediately compare:

# Local estimate from session files
ccusage daily --date today

# Then open: console.anthropic.com → Usage → filter to the last 30 minutes

A healthy session shows ccusage within 5–10% of the dashboard (the gap is normal caching overhead). If the dashboard is showing 1.5x or more what ccusage estimates, you have a divergence worth investigating. Users in the r/ClaudeAI thread were describing multipliers of 1.3x to 2.1x on affected accounts.

Pattern 2: /context inspection per request

During an active Claude Code session, type /context to see the current window usage. Note the رمز count. Then run a single simple tool call (e.g., read one small file). Run /context again immediately after. The delta should approximate: file رمزs + response رمزs + modest overhead.

# Before tool call
/context
# → note "X tokens used"

# Run: read one 50-line file
# After tool call
/context
# → new count should be X + ~300–500 for a 50-line file

If the delta is 20,000+ رمزs on a trivial file read, you're looking at the same inflation pattern described in commوحدةy reports. The /context view reflects the client-side state — if this is also inflated, the overhead is being injected before the context window is assembled on your machine.

Pattern 3: API-side رمز count via direct مقارنة

Run the identical prompt twice — once via the Claude Code CLI, once directly via the Anthropic API using the same model. Compare usage.input_tokens in the raw API response.

# Direct API call (Python, uses anthropic SDK)
import anthropic
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    messages=[{"role": "user", "content": "Say hello."}]
)
print(response.usage)
# → input_tokens: ~12 (the actual prompt)

# Now run the equivalent in Claude Code and check ccusage:
# ccusage session --last 1 | grep input_tokens

A significant gap between direct API input_tokens and Claude Code session input_tokens for an equivalent prompt points to overhead being added by the CLI layer's scaffolding — system prompt, tool definitions, or session context injected server-side.

Pattern 4: Rate-change timestamp correlation

Pull your Anthropic billing API or export your usage CSV from the console. Plot daily رمز spend per session (not per message) against calendar date. If you see a step-change in التكلفة-per-session that is not correlated with a change in your own prompting behavior, and if that step-change occurred in the first week of April 2026, it's consistent with the server-side تحديث window described in commوحدةy reports.

# From Anthropic console: Usage → Export CSV
# In your terminal (requires csvkit or similar):
csvstat --csv usage_export.csv | grep -A3 "tokens_input"

# Or in Python:
import pandas as pd
df = pd.read_csv("usage_export.csv", parse_dates=["date"])
df.groupby("date")["input_tokens"].sum().plot()

A clean visual step-change on or around April 1–3, 2026 with no corresponding change in your codebase or prompting patterns is the clearest signal that the inflation is external to your سير العمل.

Running وكيلs overnight? The رمز burn compounds fast.

Septim وكيل فرعي التكلفة Guard is a PreToolUse hook that hard-halts وكيلs mid-flight when spend crosses your threshold. $29 سعر التأسيس, 50 seats, ships May 2026. Zero-dollar reservation now.

Reserve التكلفة Guard →

3. What to do about it right now

Three actions you can take اليوم. None require waiting for Anthropic to issue a fix.

Action 1: Lower concurrency on multi-وكيل sessions

If the overhead is per-request, the most direct التكلفة lever is request volume. Reduce the number of parallel وكيل فرعيs in your current سير العملs. A session running 4 وكيلs in parallel instead of 12 doesn't cut output by 66% — most of that parallelism is latency optimization, not throughput — but it does cut the number of inflated requests by 66%.

In your CLAUDE.md, add a temporary directive:

# Temporary cost constraint (April 2026)
# Due to active token inflation reports, limit parallel subagent
# spawns to a maximum of 3 concurrent at any time.
# Prefer sequential tool calls over parallel where latency allows.

Action 2: Disable autocompact during the investigation window

Claude Code's autocompact feature summarizes long conversations to keep context manageable. When autocompact fires, it generates a new summary request — which, under current conditions, carries the same per-request overhead. Disabling it prevents those compaction requests from adding to the inflated bill while the situation is active.

In your Claude Code settings (~/.claude/settings.json):

{
  "autoCompact": false
}

Note: disabling autocompact means long sessions may run into context limits. Break long sessions manually rather than relying on autocompact to rescue them.

Action 3: Set an hourly spend alert in the Anthropic console

Anthropic's billing alerts default to monthly thresholds. That's too slow a feedback loop for an active per-request inflation issue. Set a tight hourly alert to catch runaway spend before it compounds:

Go to console.anthropic.com → Settings → Billing → Usage alerts.
Create a new alert at a threshold that represents roughly one hour of your normal usage. If you typically spend $5/day, set an alert at $1 — that's a 5x daily-rate signal within a single hour.
Set the notification to email AND, if available, SMS or webhook. Email alone can arrive 5–20 minutes after the threshold fires.

This doesn't stop the burn, but it cuts the discovery window from "morning coffee" to "within the hour." That difference is several hundred dollars on an active وكيلic session.

4. Septim's perspective

We can't fix Anthropic's infrastructure, and we're not going to pretend otherwise. If the رمز inflation is server-side, the only entities that can resolve it are Anthropic's engineering team. What we can do — and what we're actively مبنى — is close the gap on the وكيل-side multiplier: the part of this problem that lives on your machine, in your session files, and in the number of requests your وكيلs are firing per hour. Septim وكيل فرعي التكلفة Guard puts a hard ceiling on that number. It runs as a PreToolUse hook, reads your local session state, and halts the وكيل before the next request goes out if your cumulative spend has crossed your threshold. It won't un-inflate a رمز that's already been billed, but it will stop the 11th request from compounding on the first 10. The سعر التأسيس is $29 at septimlabs.com/وكيل فرعي-التكلفة-guard. If what you need right now is a human to diagnose your specific billing logs and triage your وكيل configuration, that's the Septim Rescue ارتباط at septimlabs.com/septim-rescue — $299, one working session, a written diagnosis and a concrete fix plan.

Bill already spiked? Septim Rescue.

One focused session: we audit your Claude Code billing logs, identify the inflation source, and give you a written fix plan. $299. Booked and completed within one business day.

Book Septim Rescue →

الأسئلة الشائعة

Has Anthropic acknowledged this as a bug?

Not publicly, as of April 20, 2026. What exists is a consistent body of user reports across r/ClaudeAI, the Anthropic commوحدةy forum, and independent coverage from Efficienist and DevClass describing the same pattern: billing that diverges from local session estimates starting in early April. We will تحديث this post when and if an official statement is صدر.

Could this just be system prompt or tool-definition overhead I wasn't accounting for?

Possibly, for some users. Claude Code does inject a system prompt and tool definitions with every request — that's expected overhead, typically 2,000–5,000 رمزs. The 20,000-رمز figure in commوحدةy reports is significantly above that baseline, and users who had previously benchmarked their overhead describe the gap as new behavior rather than overhead they had simply miscounted before. Run Pattern 3 above (direct API مقارنة) to isolate Claude Code's scaffolding from your own prompt content.

Will Anthropic refund the extra spend?

We have no information on Anthropic's refund policy for this situation. If you believe you've been overbilled, the appropriate first step is to مفتوح a support تذكرة at support.anthropic.com with your usage export attached and the specific date range flagged. Keep your ccusage output from the same period as supporting التوثيق.

Does this affect the API directly, or only Claude Code CLI users?

Based on commوحدةy reports, the pattern appears most pronounced for Claude Code CLI users. Direct API users who crafted their own minimal-context requests did not describe the same divergence in the threads we reviewed. This is consistent with the theory that the overhead originates in the CLI's scaffolding layer rather than the Anthropic API itself — but that's a hypothesis based on user reports, not a confirmed diagnosis.