· production audit ·

Claude Code in production: the 30-point audit no one publishes

// FILED Security & Ops // SOURCE Septim Labs // PERMALINK /blog/claude-code-production-audit-checklist-2026.html cite this →

Published April 29, 2026 · by Septim Labs · 11 min read

There is a standard security checklist for deploying a web app. There is no equivalent for deploying Claude Code. That gap has a cost: over the last 90 days, five teams that came to us for post-incident work had Claude Code running in their CI pipelines or on developer machines with configurations that exposed credentials, removed budget constraints, or created exploitable tool chains. None of them had done anything unusual. They had followed the official docs.

This checklist covers the 30 items we walk through on every Septim Audit engagement. The list is free. The audit is $99 and includes a written report, a prioritized fix list, and a 30-minute walkthrough call. If you can work through all 30 items yourself and score clean, you don't need the audit. Most teams find at least 6 problems they didn't know existed.

How to use this checklist

Réalisations through each section in order. Mark each item as Pass, Fail, or N/A. Any item marked Fail in sections 1, 2, or 3 is a blocking issue — stop and fix it before continuing. Items in sections 4 and 5 are important but not immediately dangerous.

The checklist is organized into five areas: Credential handling, Cost gates, Context hygiene, Hook configuration, and MCP exposure. Each area has a score breakdown at the end.

// Audit area breakdown — typical team failure rates

Credentials

71% fail

Cost gates

84% fail

Context hygiene

62% fail

Hook config

78% fail

MCP exposure

55% fail

Section 1: Credential handling

Claude Code's default behavior is to read from the environment, which means it inherits whatever secrets are loaded in your shell at startup. On a developer machine, that can include AWS credentials, GitHub tokens, database connection strings, and Stripe API keys — all in the same environment that a BashTool call has access to. The pattern is not a bug in Claude Code; it is standard Unix process inheritance. The audit item is whether you have made that inheritance intentional rather than accidental.

Items 1 through 7

// Section 1 — Credential handling — 7 items

API key isolation. ANTHROPIC_API_KEY is set to a key scoped to a dedicated workspace, not your personal or org-root key. If the key leaks through a tool call, blast radius is bounded.

No production credentials in shell startup. ~/.zshrc, ~/.bashrc, and ~/.profile do not export database URIs, Stripe live keys, or AWS credentials that have write permissions to production resources.

CLAUDE.md does not contain secrets. CLAUDE.md files are often committed to version control. Confirm no API key, token, or password appears in any CLAUDE.md in the repo or home directory.

Tool execution sandbox. If Claude Code runs BashTool calls, those calls execute as your current user. Confirm the user running Claude Code does not have write access to production infrastructure.

No AWS root credentials in environment. AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, if present, are scoped to read-only or dev-only IAM policies.

Git history clean of keys. Run git log --all -S "ANTHROPIC_API_KEY" -- CLAUDE.md and variants for your other secrets. Keys committed and then removed from HEAD are still exposed in history.

CI/CD pipeline scoping. If Claude Code runs in CI (GitHub Actions, CircleCI, etc.), the workflow's environment variables are limited to what that job actually needs. No kitchen-sink secret bundles injected via secrets: inherit.

Section 2: Cost gates

The April 2026 Tokenocalypse incidents — documented in GitHub issue anthropics/claude-code#41930 — had one structural cause: Claude Code subagents running unattended with no per-session or per-run spending ceiling. The existing tools (ccusage, workspace monthly caps, billing email alerts) operate at the observation layer; they report what happened. A cost gate operates at the enforcement layer; it halts what is about to happen. These 8 items check whether your setup has enforcement, not just observation.

Items 8 through 15

// Section 2 — Cost gates — 8 items

Réalisationsspace monthly limit set. In the Anthropic console under Settings → Limits, a monthly spend ceiling exists for this workspace. The default is no ceiling.

Per-session budget configured. A PreToolUse hook reads cumulative session cost from ~/.claude/projects/**/*.jsonl and exits the agent when the session exceeds a configured dollar ceiling. If this hook is absent, a single runaway session has no hard stop.

Daily ceiling enforced. Distinct from the per-session ceiling: a daily total ceiling that aggregates all sessions run in the last 24 hours. Relevant for unattended overnight runs where each session is individually modest but cumulative cost is not.

Model selection intentional. The model used by each agent is explicitly set, not left to default. Claude Opus 4.5 at $15/M input tokens costs 7.5x Claude Haiku 3.5 at $0.80/M. An unintentional model upgrade silently multiplies cost.

Prompt caching enabled where applicable. For agents that re-read the same large file or system prompt on every turn, cache_control breakpoints reduce that cost by approximately 90%. Verify caching is active for any context over 2,048 tokens that repeats across turns.

Subagent depth bounded. If the setup uses multi-agent patterns, the orchestrator's instructions explicitly cap subagent spawning depth. An orchestrator that can spawn subagents that can spawn subagents without a depth limit is the exact pattern that produced the $47K incident.

Cost alert email configured. At minimum, a billing threshold email is set in the Anthropic console. This is not a hard stop, but it is a safety net for cases where the PreToolUse hook fails silently.

Batch API used for offline workloads. Analysis or summarization tasks that do not require real-time responses use the Anthropic Batch API, which costs 50% less per token than the standard API and includes no rate-limit risk during off-peak hours.

Section 3: Context hygiene

Context size is the single largest cost lever for day-to-day Claude Code use. A session that loads a 200KB codebase into every tool turn burns approximately 12 cents per turn at Sonnet 4.5 pricing. Over a 40-turn session, that is $4.80 in context cost alone, before any actual output. These items check whether what gets loaded into context on each turn is intentional.

Items 16 through 21

// Section 3 — Context hygiene — 6 items

.claudeignore file present. The equivalent of .gitignore for what Claude Code loads. Without it, ReadFileTool will happily pull in node_modules, build artifacts, and binary assets into context. Minimum entries: node_modules/, .next/, dist/, *.lock, *.png, *.jpg, *.pdf.

CLAUDE.md is scoped, not exhaustive. A CLAUDE.md that runs 3,000 lines loads 3,000 lines on every session start. The 3,000-line CLAUDE.md problem is well-documented: it creates a context tax that compounds across every agent turn. Keep project-level CLAUDE.md under 500 lines; use sub-agent-specific files for domain detail.

Conversation compaction scheduled. Long sessions accumulate context. Either /compact is run periodically by the operator, or the agent's system prompt instructs it to self-compact when the session approaches a token threshold. Sessions that run to 33MB+ become unstable and crash.

No full-repo glob reads on every turn. Patterns like Read("**/*.ts") in an agent's tool chain load the entire TypeScript surface of the repo on every invocation. These should be replaced with targeted reads of specific files identified by a prior search step.

System prompt stored externally, not inline. Long system prompts committed inline to agent code prevent prompt caching from working correctly. Store the prompt in a separate file loaded once at session start, with a cache breakpoint at the end of the system prompt block.

Memory files pruned regularly. ~/.claude/ memory files grow indefinitely if not managed. Stale project memory loads on every relevant session. Run a monthly audit of project memory files and remove anything that refers to closed work.

Section 4: Hook configuration

Claude Code's hook system (PreToolUse, PostToolUse, Stop, Notification) is the main surface for enforcing organizational policy without modifying the model. A hook that exits with code 1 and a JSON reason blocks the tool call before it executes. Most teams either have no hooks configured, or have hooks copied from tutorials without understanding what they do. These 5 items check whether your hooks are doing what you think they are.

Items 22 through 26

// Section 4 — Hook configuration — 5 items

Hooks are tested for silent failure. A hook that crashes (uncaught exception, missing dependency) without returning a valid JSON response does not block the tool call — it is skipped. Confirm each hook handles errors explicitly and returns {"decision": "block", "reason": "..."} on failure rather than exiting with code 2 silently.

Dangerous bash patterns are blocked at the hook layer. A PreToolUse hook on BashTool checks for patterns like rm -rf, curl | sh, git push --force, and DROP TABLE in the command string before execution. This is a belt-and-suspenders check, not a substitute for proper permissions.

Hook scripts are version-controlled. Hooks defined in .claude/settings.json reference scripts that live in the repo (or a dedicated dotfiles repo). Hooks that only exist on one developer's machine are not auditable and will not propagate to CI.

Stop hook logs session summary. The Stop hook writes a line to a centralized log: timestamp, session ID, total cost, turn count, exit reason. Without this, reconstructing what an unattended agent did requires manual JSON parsing of the session files.

Notification hook is connected to a real channel. If agents run unattended, the Notification hook posts to a Slack channel or sends a webhook. An agent that finishes at 3am and nobody sees the result until morning has a 6-hour blast radius on any error it introduced.

Section 5: MCP exposure

The Model Context Protocol expanded Claude Code's reach significantly. An MCP server that exposes a database connection, a file system, or an HTTP client to the model creates a new attack surface: prompt injection through any text content the model reads during a session. A crafted commit message, a malicious README in a cloned repo, or a database row containing injected instructions can redirect an in-progress agent. These items check the MCP surface specifically.

Items 27 through 30

// Section 5 — MCP exposure — 4 items

MCP servers run with least-privilege access. An MCP server that exposes a database connection uses a read-only database user for any tool that only reads. Write access is explicitly granted only to tools that require it, not inherited from a general connection string.

External MCP servers are pinned to a specific version. MCP servers installed via npx without a version pin (npx @modelcontextprotocol/server-filesystem) will pull the latest version on each run. A supply chain compromise in an upstream package propagates silently. Pin with @x.y.z.

Prompt injection surface documented. For each MCP tool that reads external content (files, database rows, HTTP responses, git history), there is a documented assessment of whether that content could contain adversarial instructions. This does not require mitigating all risk — it requires knowing where the risk is.

MCP server list reviewed quarterly. .claude/settings.json is reviewed on a schedule. MCP servers added for a specific project and never removed accumulate over time. Each additional server is additional attack surface and additional context overhead on session start.

Interpreting your score

If you worked through all 30 items honestly, you have one of three situations.

0 failures in sections 1–3: Your setup is in reasonable shape. Réalisations through sections 4 and 5 at your own pace. The remaining items are important but unlikely to produce an incident this week.

Any failure in items 02, 04, 07, 09, 13, 22, 23, 27, or 28: These are the items with documented incident histories. Prioritize them before anything else. A failure on item 09 (no per-session cost gate) combined with item 13 (no subagent depth limit) is the exact configuration that produced the April 2026 Tokenocalypse incidents.

6 or more failures total: The configuration needs a structured review, not a line-by-line fix. Trying to patch individual items without understanding how they interact often introduces new gaps while closing old ones. This is the situation where the productized audit makes sense: a single 90-minute session that produces a written report, a prioritized fix order, and a confirmation call.

The Septim Agents Pack ($49) includes pre-built hook scripts for items 09, 10, 23, 25, and 26 — tested, version-controlled, and ready to drop into .claude/settings.json. If you found failures in those items specifically, the pack covers them without requiring you to build the hooks from scratch.

Want the audit done for you

The Septim Audit is the productized version of this checklist. One flat fee, no subscription. You get a written findings report covering all 30 items, a prioritized fix list ordered by incident risk, and a 30-minute call to walk through the results. Teams typically find 6–12 failures they did not know existed.

Septim Audit — $99, pay once →

What's next

If you are starting from scratch with Claude Code production hardening, the sequence that produces the most coverage per hour of work is: items 02 and 04 first (credential scoping), then item 09 (per-session cost gate), then item 16 (claudeignore), then item 22 (hook failure testing). Those four items close the four highest-frequency incident patterns we have seen across the last 90 days of audit work.

The full checklist, the hook scripts, and the CLAUDE.md templates are all included in the Septim Audit report. If you want the hooks without the audit, they are in the Agents Pack. Both are pay-once.