· claude code · context · token budget · april 2026 ·

Claude Code memory & context: staying under budget

// figure Token budget gauge — three zones across a 200K context window
Token budget gauge A horizontal gauge divided into three color zones: green safe zone from 0 to 55% (0–110K tokens), orange warning zone from 55% to 80% (110K–160K), and red danger zone from 80% to 100% (160K–200K). Three percentage labels mark each threshold. 0–110K · SAFE 110K–160K · WATCH 160K+ · COMPRESS 0 55% 80% 100% 200K context window · claude-sonnet-4-5
// FILED Claude Code · Context // SOURCE Septim Labs // PERMALINK /blog/claude-code-memory-context-management.html cite this →
E
By the Septim Labs team
Published April 28, 2026
Find your tool →
TL;DR
  • Claude Code has no persistent memory across sessions. Every session starts blank. Everything Claude knows about your project must either come from the filesystem it reads or from your CLAUDE.md.
  • The context window fills fastest when Claude reads large files, when sessions get long, and when sub-agents spawn and carry their own context copies.
  • Five techniques reduce context burn by 40–60% in practice: targeted file references, session hygiene, CLAUDE.md compression, compact mode, and explicit task scoping.

What "memory" means for Claude Code

Claude Code has no persistent memory in the traditional sense. It does not remember what you did in yesterday's session. It does not accumulate knowledge about your codebase over time. Each time you run claude in a project directory, the model starts with an empty context and fills it by reading files, running tools, and processing your conversation.

What persists across sessions is the filesystem — and specifically, anything you have written into files Claude Code can read. Your CLAUDE.md is not a memory system; it is a pre-loaded document that substitutes for the exploration Claude would otherwise do by reading every file in the repo. The Memory MCP server (from Anthropic's reference implementations) provides a separate persistent graph, but it requires explicit setup and is not the default behavior.

200Ktokens: claude-sonnet-4-5 context window
~750words per 1K tokens (rough estimate)
sessions you can run — no rollover from one to next
/compactcommand that summarizes and compresses the current session

How context fills up

Understanding where tokens go is the prerequisite for controlling cost. In a typical Claude Code session, the context window fills from four sources in roughly this order:

  1. System prompt and CLAUDE.md. These are loaded at session start and persist for the duration. A 500-line CLAUDE.md costs roughly 2,500 tokens up front — which is well worth it for the exploration it replaces, but worth knowing.
  2. File reads. Every time Claude Code reads a file to answer a question or understand context, the file contents enter the context. A 400-line TypeScript module is approximately 3,000–4,000 tokens. Reading 10 files costs 30,000–40,000 tokens before the model has written a single line of code.
  3. Tool outputs. When Claude runs bash commands, the output is appended to context. npm test producing 200 lines of output consumes as much context as a medium-sized file.
  4. Conversation history. Every message — yours and Claude's — stays in the context for the life of the session. A long back-and-forth about a complex problem can consume as much context as all the file reads combined.

The practical consequence: the first 20 minutes of a complex session are cheap. The last 20 minutes of a session that has been running for 2 hours are expensive per turn, because each new message carries the full history as context.

The five techniques

1. Reference specific files, not the whole codebase

Claude Code will explore broadly if you ask broad questions. "What's wrong with my authentication system?" triggers a filesystem scan. "Is there a race condition in the refreshToken function in src/lib/auth.ts?" reads one file. Both cost tokens; the second produces a more useful answer at a fraction of the price.

When you need Claude Code to understand a broad area of the codebase, use your CLAUDE.md to pre-describe the architecture rather than asking Claude to explore it. "The auth module uses JWT, implemented in src/lib/auth.ts, with middleware in src/middleware/auth.ts" costs 30 tokens of CLAUDE.md and prevents 30,000 tokens of exploration.

2. Session hygiene — start fresh for unrelated tasks

One long session is usually more expensive than two short sessions for the same total work. The reason is that every message in session B carries the context of task A, even though task A is finished and irrelevant. Starting a new session for an unrelated task costs the setup overhead of a new CLAUDE.md load (~2,500 tokens) but saves the accumulated history of the previous task.

A practical rule: when you move from one task to a qualitatively different one — from "write tests for the user module" to "refactor the payment module" — start a new session. Use the same session only when tasks share context that would need re-establishing.

3. Use /compact before the session degrades

Claude Code has a built-in /compact command that asks the model to summarize the session so far and replace the full conversation history with the summary. This can reduce context by 60–80% while preserving the decisions made and the state reached.

Run /compact when you notice response quality degrading, when the session has been running for over 90 minutes, or when the token indicator (if your terminal shows it) crosses the 55% threshold shown in the gauge above. After compaction, the session continues with a summary-based context rather than the full transcript.

4. Write task handoff notes to files

If you are working on a multi-session task — something that will take several days — write a handoff note at the end of each session. This is a plain text file in the project directory that describes: what was completed, what was decided, what comes next, and what the current state of any in-progress work is.

At the start of the next session, point Claude Code at the handoff note explicitly. This replaces the exploration Claude would otherwise do to re-establish context, and it is more accurate than asking Claude to infer the current state from the git log.

# Task handoff — 2026-04-28 — Auth refactor

## Completed
- Replaced bcrypt with argon2 in src/lib/auth.ts
- Updated all tests — 47 passing

## Decided
- Not migrating existing password hashes on first release
- Will add migration UI in v1.1

## Next session
- Implement the session invalidation endpoint (POST /api/auth/invalidate)
- Wire it to the admin panel logout button in src/components/AdminNav.tsx

## State
- src/lib/auth.ts is clean, committed
- src/middleware/auth.ts has an uncommitted WIP for the invalidate handler

5. Control sub-agent context budgets explicitly

Claude Code can spawn sub-agents for parallel work. Each sub-agent carries a full copy of the context it needs — which means a five-agent parallel task consumes five times the context of a sequential task. This is the pattern behind the "Tokenocalypse" cost spikes documented in the sub-agent cost guide.

When you instruct Claude Code to use parallel agents, be explicit about the scope each agent should have. "Run three agents in parallel, each reading only its assigned module" produces much lower cost than "run three agents on the full codebase." The sub-agent task specification is where you apply the same file-scoping principles as individual sessions.

Septim Drills — $29 买断制

Structured exercises for Claude Code power users. Includes a context-management drill set — 12 exercises covering session hygiene, CLAUDE.md compression, sub-agent budgets, and the /compact workflow. Verified against real cost data from the Anthropic console.

Get Septim Drills — $29 →

The CLAUDE.md compression pass

Most CLAUDE.md files grow over time. A developer adds a new rule here, a new convention there, a reminder about the legacy module, a note about the production database credentials format. After three months, the CLAUDE.md is 800 lines and costs 4,000 tokens per session just to load — tokens that are spent before Claude has read any code.

A compression pass on your CLAUDE.md should happen every 30 天 on an active project. The goal is to reduce the file to the information that Claude could not infer from reading the code itself. Things to cut:

The Septim Drills ($29) pack includes a CLAUDE.md audit exercise that walks through this compression pass in a structured way — with before/after token counts and quality comparisons showing that shorter, denser CLAUDE.md files produce better Claude Code behavior than verbose ones.

When you hit the context limit mid-task

If a task genuinely requires more context than the window allows, the options are:

  1. Decompose the task. Break it into stages, each in its own session. Use handoff notes between stages.
  2. Use /compact and continue. Run /compact to replace history with a summary, then continue. Quality degrades somewhat after compaction, but it is usually adequate for completing a task in progress.
  3. Use a larger context model. Anthropic's claude-opus-4 has extended context options. This costs more per token but solves the window constraint for tasks that cannot be decomposed.
  4. Use file-based persistence. Have Claude Code write its intermediate findings to files — summaries, outlines, decision logs — and read those files instead of the full conversation history in subsequent turns.

The fourth approach is the most token-efficient for long-running multi-session work. It is also the one that most developers do not think to set up until they have already hit the wall twice. Building file-based persistence into your task structure from the start is a better pattern than retrofitting it under pressure.

Using the Memory MCP server for persistent context

For teams that want true persistence across sessions — not just file-based handoff notes but a queryable knowledge graph — the Memory MCP server (Anthropic's official reference implementation) provides entity-relation交付 storage that survives across Claude Code sessions.

The setup requires running the MCP server locally and configuring it in your .claude/settings.json. Once wired up, Claude Code can read and write observations about your codebase — which modules are stable, which are in active development, which APIs are deprecated — and retrieve them in future sessions without re-reading all the source files.

See the MCP server tutorial for the exact configuration. The Septim Vault ($29) includes a pre-built MCP configuration for the memory server alongside seven other commonly-needed servers, saving the per-server setup overhead.

Septim Drills — $29 · context mastery exercises

Twelve structured context-management drills, a CLAUDE.md audit checklist, sub-agent budgeting exercises, and a cost-projection worksheet. 买断制, GitHub repo invite on purchase.

购买 Septim Drills — $29 → Burned through your API budget? Septim Rescue ($299) →