· claude code · context · token budget · april 2026 ·

Claude Code memory & context: staying under budget

// figure Token budget gauge — three zones across a 200K context window

// FILED Claude Code · Context // SOURCE Septim Labs // PERMALINK /blog/claude-code-memory-context-management.html cite this →

Published April 28, 2026 · by Septim Labs · 11 min read

TL;DR

Claude Code has no persistent memory across sessions. Every session starts blank. Everything Claude पता about your project must either come from the filesystem it पढ़ता है या from your CLAUDE.md.
The context window fills fastest when Claude पढ़ता है large files, when sessions get long, और when sub-agents spawn और carry their own context copies.
Five techniques reduce context burn by 40–60% in practice: targeted file references, session hygiene, CLAUDE.md compression, compact mode, और explicit task scoping.

क्या "memory" means for Claude Code

Claude Code has no persistent memory in the traditional sense. It does नहीं remember what you did in yesterday's session. It does नहीं accumulate knowledge about your codebase over time. Each time you चलती हैं claude in a project directory, the model starts के साथ an empty context और fills it by reading files, running tools, और processing your conversation.

क्या persists across sessions is the filesystem — और specifically, anything you have written into files Claude Code can read. Your CLAUDE.md is नहीं a memory system; it is a pre-loaded document that substitutes for the exploration Claude would otherwise do by reading हर file in the repo. The Memory MCP server (from Anthropic's reference implementations) provides a separate persistent graph, लेकिन it requires explicit setup और is नहीं the default behavior.

200Ktokens: claude-sonnet-4-5 context window

~750words per 1K tokens (rough estimate)

∞sessions you can चलती हैं — no rollover from one to next

/compactcommand that summarizes और compresses the current session

कैसे context fills up

Understanding where tokens go is the prerequisite for controlling cost. In a typical Claude Code session, the context window fills from four sources in roughly this order:

System prompt और CLAUDE.md. These are loaded at session start और persist for the duration. A 500-line CLAUDE.md costs roughly 2,500 tokens up front — which is well worth it for the exploration it replaces, लेकिन worth knowing.
File reads. Every time Claude Code पढ़ता है a file to answer a question या understand context, the file contents enter the context. A 400-line TypeScript module is approximately 3,000–4,000 tokens. Reading 10 files costs 30,000–40,000 tokens before the model has written a single line of code.
Tool outputs. कब Claude चलती है bash commands, the output is appended to context. npm test producing 200 lines of output consumes as much context as a medium-sized file.
Conversation history. Every message — yours और Claude's — stays in the context for the life of the session. A long back-and-forth about a complex problem can consume as much context as सभी the file पढ़ता है combined.

The practical consequence: the पहले 20 मिनट of a complex session are cheap. The आख़िरी 20 मिनट of a session that has been running for 2 घंटे are expensive per turn, क्योंकि हर new message carries the full history as context.

The five techniques

1. Reference specific files, नहीं the whole codebase

Claude Code will explore broadly अगर you ask broad questions. "What's wrong के साथ my authentication system?" triggers a filesystem scan. "Is there a race condition in the refreshToken function in src/lib/auth.ts?" पढ़ता है one file. Both cost tokens; the दूसरा produces a more useful answer at a fraction of the price.

कब you ज़रूरत Claude Code to understand a broad area of the codebase, इस्तेमाल your CLAUDE.md to pre-describe the architecture rather than asking Claude to explore it. "The auth module इस्तेमाल करता है JWT, implemented in src/lib/auth.ts, के साथ middleware in src/middleware/auth.ts" costs 30 tokens of CLAUDE.md और prevents 30,000 tokens of exploration.

2. Session hygiene — start fresh for unrelated tasks

One long session is usually more expensive than two short sessions for वही total work. The reason is that हर message in session B carries the context of task A, even though task A is finished और irrelevant. Starting a new session for an unrelated task costs the setup overhead of a new CLAUDE.md load (~2,500 tokens) लेकिन saves the accumulated history of the पिछला task.

A practical rule: when you move from one task to a qualitatively different one — from "write tests for the user module" to "refactor the payment module" — start a new session. Use वही session सिर्फ़ when tasks share context that would ज़रूरत re-establishing.

3. Use /compact before the session degrades

Claude Code has a built-in /compact command that asks the model to summarize the session तो far और replace the full conversation history के साथ the summary. This can reduce context by 60–80% while preserving the decisions made और the state reached.

Run /compact when you notice response quality degrading, when the session has been running for over 90 मिनट, या when the token indicator (if your terminal दिखाता है it) crosses the 55% threshold shown in the gauge above. After compaction, the session continues के साथ a summary-based context rather than the full transcript.

4. Write task handoff notes to files

If you are working on a multi-session task — something that will take several दिन — लिखना a handoff note at the end of हर session. यह है a plain text file in the project directory that describes: what was completed, what was decided, what comes next, और what the current state of any in-progress work is.

At the start of the अगला session, point Claude Code at the handoff note explicitly. This replaces the exploration Claude would otherwise do to re-establish context, और it is more accurate than asking Claude to infer the current state from the git log.

# Task handoff — 2026-04-28 — Auth refactor

## Completed
- Replaced bcrypt with argon2 in src/lib/auth.ts
- Updated all tests — 47 passing

## Decided
- Not migrating existing password hashes on first release
- Will add migration UI in v1.1

## Next session
- Implement the session invalidation endpoint (POST /api/auth/invalidate)
- Wire it to the admin panel logout button in src/components/AdminNav.tsx

## State
- src/lib/auth.ts is clean, committed
- src/middleware/auth.ts has an uncommitted WIP for the invalidate handler

5. Control sub-agent context budgets explicitly

Claude Code can spawn sub-agents for parallel work. Each sub-agent carries a full copy of the context it needs — which means a five-agent parallel task consumes five times the context of a sequential task. This is the pattern behind the "Tokenocalypse" cost spikes documented in the sub-agent cost guide.

When you instruct Claude Code to use parallel agents, be explicit about the scope each agent should have. "Run three agents in parallel, each reading only its assigned module" produces much lower cost than "run three agents on the full codebase." The sub-agent task specification is where you apply the same file-scoping principles as individual sessions.

Septim Drills — $29 pay-once

Structured exercises for Claude Code power users. Includes a context-management drill set — 12 exercises covering session hygiene, CLAUDE.md compression, sub-agent budgets, and the /compact workflow. Verified against real cost data from the Anthropic console.

Get Septim Drills — $29 →

The CLAUDE.md compression pass

Most CLAUDE.md files grow over time. A developer adds a new rule here, a new convention there, a reminder about the legacy module, a note about the production database credentials format. After three months, the CLAUDE.md is 800 lines and costs 4,000 tokens per session just to load — tokens that are spent before Claude has read any code.

A compression pass on your CLAUDE.md should happen every 30 days on an active project. The goal is to reduce the file to the information that Claude could not infer from reading the code itself. Things to cut:

Rules that are enforced by the linter or CI — Claude will see the failure anyway
Descriptions of file structure that are obvious from the directory listing
Decisions that have been superseded but never removed
Examples that repeat what the code itself demonstrates

The Septim Drills ($29) pack includes a CLAUDE.md audit exercise that walks through this compression pass in a structured way — with before/after token counts and quality comparisons showing that shorter, denser CLAUDE.md files produce better Claude Code behavior than verbose ones.

When you hit the context limit mid-task

If a task genuinely requires more context than the window allows, the options are:

Decompose the task. Break it into stages, each in its own session. Use handoff notes between stages.
Use /compact and continue. Run /compact to replace history with a summary, then continue. Quality degrades somewhat after compaction, but it is usually adequate for completing a task in progress.
Use a larger context model. Anthropic's claude-opus-4 has extended context options. This costs more per token but solves the window constraint for tasks that cannot be decomposed.
Use file-based persistence. Have Claude Code write its intermediate findings to files — summaries, outlines, decision logs — and read those files instead of the full conversation history in subsequent turns.

The fourth approach is the most token-efficient for long-running multi-session work. It is also the one that most developers do not think to set up until they have already hit the wall twice. Building file-based persistence into your task structure from the start is a better pattern than retrofitting it under pressure.

Using the Memory MCP server for persistent context

For teams that want true persistence across sessions — not just file-based handoff notes but a queryable knowledge graph — the Memory MCP server (Anthropic's official reference implementation) provides entity-relationship storage that survives across Claude Code sessions.

The setup requires running the MCP server locally and configuring it in your .claude/settings.json. Once wired up, Claude Code can read and write observations about your codebase — which modules are stable, which are in active development, which APIs are deprecated — and retrieve them in future sessions without re-reading all the source files.

See the MCP server tutorial for the exact configuration. The Septim Vault ($29) includes a pre-built MCP configuration for the memory server alongside seven other commonly-needed servers, saving the per-server setup overhead.

Septim Drills — $29 · context mastery exercises

Twelve structured context-management drills, a CLAUDE.md audit checklist, sub-agent budgeting exercises, and a cost-projection worksheet. Pay once, GitHub repo invite on purchase.

Buy Septim Drills — $29 → Burned through your API budget? Septim Rescue ($299) →