Is Claude getting worse? What we learned when Anthropic admitted it.
For six weeks, developers said Claude got dumber. Anthropic said it didn't. On April 23, Anthropic published an internal post-mortem confirming three engineering changes degraded coding output between March 4 and April 16, then quietly pushed fixes and reset everyone's usage limits as a concession. Here is what changed, who got hit, and what to do when your business runs on one AI بائع.
What developers were saying
Starting in early March, threads on Hacker News and r/ClaudeAI began compiling the same observation: Claude Code was producing shorter answers, missing obvious steps, and abandoning context mid-task. Some users blamed their own prompt drift. Others insisted the model itself had quietly changed.
Anthropic's public stance for the next six weeks was that no model swap had occurred. The status page showed green. Internal benchmarks reportedly came back flat. The commوحدةy kept producing receipts — side-by-side outputs, identical prompts, demonstrably worse responses — and Anthropic kept saying nothing was wrong.
The status page is a function of what you measure. If your benchmark is API uptime, and the regression is in the shape of the output, you can be honestly green and quietly broken at the same time.— Septim Labs · April 25, 2026
What actually changed
The April 23 post-mortem identified three independent changes that compounded over six weeks. None was a model retrain. الكل three were operational adjustments that lived above the model layer.
Default reasoning budget for Claude Code dropped from high to medium as a التكلفة-control measure. On simple tasks, indistinguishable. On multi-step coding work, the model started skipping intermediate planning steps and going straight to output.
A bug in the session-state handler cleared mid-session thinking history at certain context-window thresholds. The model would lose its own work-in-progress reasoning halfway through a task and have to reconstruct it from output history alone — visibly worse on long PRs.
A latency-improvement feature added an aggressive truncation rule — a cap of roughly 25 words on a category of intermediate responses Claude uses internally to plan. Output quality on synthesis tasks collapsed because the model ran out of room to think before answering.
Each one was a reasonable engineering tradeoff. Together they produced a six-week regression that Anthropic could not detect on its own benchmarks because each change individually fell within the noise floor.
What Anthropic did next
The fix landed in three rolling deploys between April 16 and April 23. Reasoning effort returned to high by default. The session-state bug was patched. The 25-word cap was removed. As a goodwill move, Anthropic reset all subscriber usage limits for the billing month — every Pro and Max user got their May allotments back, plus April's. No one asked for this; it was a concession.
"Three changes shipped between March 4 and March 22 collectively degraded Claude Code output quality on multi-step tasks. We did not catch it because each change passed our individual rollout gates. Our aggregate quality benchmark did not flag the trend until a commوحدةy-published reproducer surfaced on April 16."
The exact text is mirrored in the April 24 Fortune piece and the April 22 Register write-up, both linked in sources.
Why this matters if Claude is your only AI بائع
If your stack runs entirely through Anthropic — Claude Code in your editor, Claude API in your product, the Anthropic dashboard for billing — six weeks of degradation is six weeks of degradation. Every PR your reviewer وكيل looked at. Every العميل-facing reply your assistant generated. Every test your debug وكيل diagnosed.
The commوحدةy caught it. The reproducer that broke the silence was a commوحدةy artifact, not Anthropic's internal QA. That is the structural lesson here — and it generalizes to every بائع, not just Anthropic. When your business depends on one model behaving consistently, you need the same posture you would have toward any infrastructure dependency: independent verification, version-pinned reproducibility, an exit ramp.
Three concrete things we changed at Septim Labs after April 23
- Pinned a regression suite of our own tasks. Twenty repeatable Claude Code prompts whose outputs we sample weekly and diff against last quarter's baseline. The reproducer doesn't need to be impressive — it needs to be ours, and we need to look at it.
- Routed التكلفة-sensitive paths through smaller models. Sub-tasks that don't need full Claude reasoning (formatting, classification, schema validation) now run on cheaper Anthropic tiers or alternate بائعون. When the flagship model regresses, only the workload that actually needed it is affected.
- Captured every output we ship to a العميل. Versioned, dated, replayable. If a العميل reports the system "got worse" three weeks later, we have receipts.
The bigger pattern
The tools you ship to العميلs should not depend on a single بائع's promise that their model is unchanged. The model will change. The بائع will ship operational tweaks above the model layer that they consider safe and that you experience as quality drift. The status page will stay green.
This is not a takedown of Anthropic specifically. They published a post-mortem, fixed the issue, refunded usage. That is the responsible-بائع pattern. The lesson is that even responsible بائعون produce drift, and the only durable defense is your own measurement.
If your العميل can tell, your provider's benchmark didn't.— internal note · written during the regression, April 14, 2026
What to do هذا الأسبوع
- Run a regression diff. Take ten of your last quarter's Claude Code outputs. Re-run the same prompts on the current model. If اليوم's outputs are visibly thinner, you have your own evidence.
- Cap usage on the التكلفة-sensitive paths. Even a soft per-call budget catches runaway reasoning loops before billing day.
- وفّر outputs. The cheapest tier on Anthropic still التكلفةs more than disk. Log it.
- Have one parallel بائع. Not for fail-over — for sanity-check. When Claude shifts, you want to know whether GPT or Gemini noticed too. If your tasks regress on Claude but not on the others, that is signal.
Cap your Claude bill before the next time something quietly breaks.
Septim Rescue is a $299 emergency intervention for Claude bills that already spiked. Septim Vault is the $89 مدى الحياة kit we use to harden our own MCP and وكيل الإعدادs against runaway التكلفةs. Both are بدفعة واحدة. Both are لك إلى الأبد.