· incident postmortem ·

The Tokenocalypse: why your Claude وكيل فرعيs burned $47,000 (and how to stop it)

// figure وكيل فرعي loop التكلفة spiral

// FILED التكلفة & Ops// SOURCE Septim Labs// PERMALINK /المدوّنة/رمزocalypse-وكيل فرعي-التكلفة-دليل.htmlcite this →

Published April 20, 2026 · by Septim Labs · 9 min read

On the morning of April 1, 2026, a financial-services engineering team at a mid-size firm logged in to a $47,000 Anthropic bill for the previous 72 hours. Twenty-three Claude Code وكيل فرعيs — spun up for a large codebase refactor — had gone unattended for the weekend. None of them crashed. None of them failed. They just kept going.

That incident became the seed of what the commوحدةy now calls the Tokenocalypse. The root GitHub discussion, anthropics/claude-code#41930, has accumulated 200+ comments in two weeks from developers with similar stories: a dev left a branch-summarization وكيل running overnight, woke up to a $12K bill; a منفرد المؤسّس hit their monthly workspace limit in 4 hours; a test-generation وكيل recursively spawned itself until the API started throttling the entire org.

This post is a technical postmortem of what actually happened, an honest audit of what the existing التكلفة tools caught and what they missed, and a concrete description of the one category of tooling that doesn’t exist on the market yet but needs to.

What actually happened

The short version: Claude Code’s وكيل فرعي pattern is a productivity multiplier that accidentally doubles as a التكلفة multiplier. A single orchestrator وكيل calls N child وكيل فرعيs in parallel. Each child can call M child وكيل فرعيs of its own. With any value of N or M above 2, a ten-minute prompt can turn into a thirty-minute run that pays for itself — or a twelve-hour run that doesn’t.

The ingredients

Parallel tool use: Claude 4.7 Sonnet is particularly good at kicking off multiple tool calls in a single turn. Great for latency. Expensive if each call spawns a وكيل فرعي.
Long context: a codebase of 500k+ رمزs, loaded into every child وكيل فرعي, means every tool call incurs the full context التكلفة.
Overnight runs: the main orchestrator doesn’t actually need to finish before you go to bed. It’ll wait. But the رمزs keep billing.
No local التكلفة gate: Anthropic’s workspace limits exist but trigger on a 15–30 minute window, not per-tool-call. By the time they fire, you’ve already spent the budget.

The April 1-3 timeline

Mar 31 · 18:00

Dev starts a "refactor this service to TypeScript strict mode" وكيل session. Session contains 1 orchestrator + 4 وكيل فرعي types (analysis, migration, test-gen, PR-prep). Goes home.

Mar 31 · 23:40

Analysis وكيل finishes. أوchestrator fans out to migration وكيل فرعيs — one per file, 187 files. 187 وكيل فرعيs launch in waves.

Apr 1 · 03:15

Each migration وكيل فرعي completes, triggers a test-gen وكيل فرعي. Context bloat: each test-gen pulls the full module plus test harness plus siblings-it-imports.

Apr 1 · 09:20

Developer arrives at work. Sees وكيلs still running. Assumes "almost done." Doesn’t check the dashboard.

Apr 1 · 14:00

First Anthropic rate-limit warnings fire at org level. Assumed to be a transient issue. Throttled for 10 دقائق. Resume.

Apr 2 · 06:00

PR-prep وكيلs start firing. Each one re-reads the full changeset. التكلفة per PR-prep وكيل فرعي: ~$180.

Apr 2 · 20:00

187 PR-prep وكيل فرعيs run. التكلفة to date: ~$34,000. No human has checked the dashboard.

Apr 3 · 10:00

Finance notices the anomaly in the morning billing-API sync. Pages engineering. وكيلs are halted manually.

Apr 3 · 10:30

Final bill for the 3-day window: $47,320. The completed work is ~40% of what the orchestrator intended to ship.

That’s one team. Multiply it by the 200+ self-reports in the GitHub thread, and the Tokenocalypse starts to look less like a bug and more like a structural gap in how Claude Code’s وكيل فرعي pattern interacts with how dev teams actually use it overnight.

What the existing tools caught (and what they missed)

There are four categories of tooling a developer has right now to monitor Claude Code API spend. الكل four are useful. None of them solved the Tokenocalypse pattern.

Tool	What it does	Why it missed the Tokenocalypse
Anthropic Workspace Limits	Per-workspace monthly cap. Blocks new requests once cap is exceeded.	شهريًا window. Firms with high-limit workspaces can burn $50K in 3 days and still be under cap.
ccusage (OSS)	CLI that reads local ~/.claude/projects/*.jsonl and computes spend.	منشور-mortem tool. Reports spend after it happens. Does not halt or alert mid-flight.
Claude Code Usage Monitor	Dashboard that polls local session files every ~15 min and visualizes burn rate.	15-minute polling window. A runaway وكيل فرعي wave can burn $8K in 15 min before the next poll.
Anthropic billing alerts	Email alert when monthly spend crosses configurable thresholds.	Email-based. Arrives 5–20 min after threshold. By the time you read it, damage is done.

Two things are true at once: every one of these tools is useful for general budget awareness, and none of them are designed to halt a runaway وكيل فرعي mid-flight. That’s not a criticism — it’s a structural observation. The existing tools live at the observation layer. The Tokenocalypse needs a tool at the enforcement layer.

The category that doesn’t exist yet

Call it a mid-flight التكلفة gate. The requirements are concrete:

Runs as a PreToolUse hook, not a post-execution reporter. Every tool call passes through the gate before it leaves the machine.
Reads local session state (~/.claude/projects/**/*.jsonl) to compute cumulative التكلفة in-process. No external API call, no dashboard latency.
Per-session, per-run, and per-day budget ceilings. A session can be capped at $10. A single-وكيل run can be capped at $2. A daily الإجمالي can be capped at $50. Any breach halts the وكيل.
Hard halt with a logged reason. Not a warning. The وكيل exits. The reason is surfaced to the user in Claude Code’s status line.
Per-وكيل override. You can configure a higher cap for a specific named وكيل when you really do need that $50 analysis run.
اختياري out-of-band notification via Slack webhook, email, or generic HTTP callback. Nice-to-have, not مطلوب.

Notice what this list does not include: a dashboard, a daemon, a SaaS اشتراك, an API call to a بائع. The entire gate runs inside the PreToolUse hook machinery Claude Code already supports. الكل the data it needs is on your laptop. الكل the decisions it makes are local. أُطلق right, it’s a 300-line Python or Rust binary plus a YAML config.

We’re مبنى exactly this at Septim Labs. It’s called Septim وكيل فرعي التكلفة Guard. The launch list is مفتوح right now at $29 سعر التأسيس for the first 50 seats, with standard pricing at $49 after. Zero-dollar-now reservation form — we ship May 2026, you get a single email with the Stripe link the moment the binary is ready. No autobill, no drip campaign, no credit card at reservation.

Reserve your التكلفة Guard seat

Kills runaway وكيل فرعيs mid-flight. PreToolUse hook, reads local session files, hard-halts on budget breach. $29 سعر التأسيس for the first 50 buyers, $49 after. Pay $0 now.

Reserve now →

What you can do tonight (before التكلفة Guard ships)

Even without التكلفة Guard, you can reduce your Tokenocalypse exposure to near-zero with three practices that take about an hour to اضبط:

1. Cap your session رمز budget in your CLAUDE.md

Add a directive to your project CLAUDE.md:

# Budget constraint
Every subagent spawned in this project must check
`~/.claude/projects/$CURRENT_SESSION.jsonl` before making its 10th tool
call, compute cumulative cost, and halt if cumulative cost exceeds $5.
Report the halt to the orchestrator with the reason "budget-exceeded".

This isn’t a reliable enforcement mechanism — Claude is good at following it, but not perfect. It buys you one layer. Your next overnight run stops at $5-per-وكيل فرعي instead of $180.

2. Never leave a multi-وكيل فرعي session unattended overnight

If a run is expected to take more than four hours, break it into smaller runs you can resume in the morning. Claude Code’s session-resume feature means you pay nothing to interrupt and continue — the context is already cached.

3. Poll ccusage every 10 دقائق during long runs

التثبيت ccusage and wrap it in a tiny loop:

watch -n 600 'ccusage total | tail -5'

This is the observation-layer workaround for the missing enforcement layer. You’ll notice a runaway within 10 دقائق of it starting. That’s not great, but it’s a 10x improvement on the 6-hour "notice at morning coffee" pattern that التكلفة one team $47K.

What we’re not fixing

A few things worth calling out as out of scope for any التكلفة-gate tool:

We can’t predict التكلفة before the tool call fires. LLM التكلفةs are رمز-weighted, and the Claude API doesn’t return a pre-execution estimate. A التكلفة gate can only enforce cumulative ceilings, not per-call predictions.
We don’t stop Anthropic’s own billing. Once a tool call has left the machine and hit the Anthropic API, it’s billed. A التكلفة gate prevents the next call, not the in-flight one.
We don’t replace Anthropic’s workspace limits. Those are the last-line enforcement and should stay on. A التكلفة gate is upstream of them, not a substitute.

Closing

The Tokenocalypse was predictable. Recursive وكيل فرعي patterns + long context + overnight runs + no local enforcement = runaway spend. The surprise isn’t that it happened on April 1, 2026. The surprise is that a structural fix didn’t exist before the incident.

If you’ve had your own Tokenocalypse moment and want a heads-up the second we ship التكلفة Guard, reserve your seat at /وكيل فرعي-التكلفة-guard. If you haven’t — yet — the three tonight-level practices above will keep you off the GitHub issue.

Tell us what bit you

The التكلفة Guard default thresholds are being tuned on real Tokenocalypse postmortems. If you’ve got a story — even an anonymous one — email SeptimLabs@gmail.com with "Tokenocalypse" in the subject. We use buyer intel to ship better defaults.