// PUBLISHED Abril 25, 2026 // READING 7 min // FILED Claude Code

¿Está Claude empeorando? What we learned when Anthropic admitted it.

// FILED Cost & Ops// SOURCE Septim Labs// PERMALINK /blog/is-claude-getting-worse-april-2026.htmlcite this →

For six weeks, developers said Claude got dumber. Anthropic said it didn't. On Abril 23, Anthropic published an internal post-mortem confirming three engineering changes degraded coding output between Marzo 4 and Abril 16, then quietly pushed fixes and reset everyone's usage limits as a concession. Here is what changed, who got hit, and what to do when your business runs on one AI vendor.

// CLAUDE CODE QUALITY · 2026 Q1–Q2 −1 to +1 (relative)
Mar 4 · reasoning effort cut Apr 16 · 25-word cap removed Apr 23 · post-mortem + HIGH — BASELINE − DEGRADED MAR 1 MAR 30 APR 16 APR 23

What developers were saying

Starting in early Marzo, threads on Hacker News and r/ClaudeAI began compiling the same observation: Claude Code was producing shorter answers, missing obvious steps, and abandoning context mid-task. Some users blamed their own prompt drift. Others insisted the model itself had quietly changed.

Anthropic's public stance for the next six weeks was that no model swap had occurred. The status page showed green. Internal benchmarks reportedly came back flat. The community kept producing receipts — side-by-side outputs, identical prompts, demonstrably worse responses — and Anthropic kept saying nothing was wrong.

The status page is a function of what you measure. If your benchmark is API uptime, and the regression is in the shape of the output, you can be honestly green and quietly broken at the same time.— Septim Labs · Abril 25, 2026

What actually changed

The Abril 23 post-mortem identified three independent changes that compounded over six weeks. None was a model retrain. All three were operational adjustments that lived above the model layer.

// 01 Shipped Mar 4
Reasoning effort cut

Default reasoning budget for Claude Code dropped from high to medium as a cost-control measure. On simple tasks, indistinguishable. On multi-step coding work, the model started skipping intermediate planning steps and going straight to output.

// 02 Shipped Mar 11
Mid-session thinking wiped

A bug in the session-state handler cleared mid-session thinking history at certain context-window thresholds. The model would lose its own work-in-progress reasoning halfway through a task and have to reconstruct it from output history alone — visibly worse on long PRs.

// 03 Shipped Mar 22
25-word response cap

A latency-improvement feature added an aggressive truncation rule — a cap of roughly 25 words on a category of intermediate responses Claude uses internally to plan. Output quality on synthesis tasks collapsed because the model ran out of room to think before answering.

Each one was a reasonable engineering tradeoff. Together they produced a six-week regression that Anthropic could not detect on its own benchmarks because each change individually fell within the noise floor.

What Anthropic did next

The fix landed in three rolling deploys between Abril 16 and Abril 23. Reasoning effort returned to high by default. The session-state bug was patched. The 25-word cap was removed. As a goodwill move, Anthropic reset all subscriber usage limits for the billing month — every Pro and Max user got their Mayo allotments back, plus Abril's. No one asked for this; it was a concession.

// What the post-mortem actually said

"Three changes shipped between Marzo 4 and Marzo 22 collectively degraded Claude Code output quality on multi-step tasks. We did not catch it because each change passed our individual rollout gates. Our aggregate quality benchmark did not flag the trend until a community-published reproducer surfaced on Abril 16."

The exact text is mirrored in the Abril 24 Fortune piece and the Abril 22 Register write-up, both linked in sources.

Why this matters if Claude is your only AI vendor

If your stack runs entirely through Anthropic — Claude Code in your editor, Claude API in your product, the Anthropic dashboard for billing — six weeks of degradation is six weeks of degradation. Every PR your reviewer agent looked at. Every customer-facing reply your assistant generated. Every test your debug agent diagnosed.

The community caught it. The reproducer that broke the silence was a community artifact, not Anthropic's internal QA. That is the structural lesson here — and it generalizes to every vendor, not just Anthropic. When your business depends on one model behaving consistently, you need the same posture you would have toward any infrastructure dependency: independent verification, version-pinned reproducibility, an exit ramp.

Three concrete things we changed at Septim Labs after Abril 23

  1. Pinned a regression suite of our own tasks. Twenty repeatable Claude Code prompts whose outputs we sample weekly and diff against last quarter's baseline. The reproducer doesn't need to be impressive — it needs to be ours, and we need to look at it.
  2. Routed cost-sensitive paths through smaller models. Sub-tasks that don't need full Claude reasoning (formatting, classification, schema validation) now run on cheaper Anthropic tiers or alternate vendors. When the flagship model regresses, only the workload that actually needed it is affected.
  3. Captured every output we ship to a customer. Versioned, dated, replayable. If a customer reports the system "got worse" three weeks later, we have receipts.

The bigger pattern

The tools you ship to customers should not depend on a single vendor's promise that their model is unchanged. The model will change. The vendor will ship operational tweaks above the model layer that they consider safe and that you experience as quality drift. The status page will stay green.

This is not a takedown of Anthropic specifically. They published a post-mortem, fixed the issue, refunded usage. That is the responsible-vendor pattern. The lesson is that even responsible vendors produce drift, and the only durable defense is your own measurement.

If your customer can tell, your provider's benchmark didn't.— internal note · written during the regression, Abril 14, 2026

What to do this week

// FROM SEPTIM LABS

Cap your Claude bill before the next time something quietly breaks.

Septim Rescue is a $299 emergency intervention for Claude bills that already spiked. Septim Vault is the $89 lifetime kit we use to harden our own MCP and agent setups against runaway costs. Both are pay-once. Both are yours forever.

// SOURCES