The core insight is this: Claude is stateless — every time you send a message, the entire conversation history gets re-transmitted to the model. And token costs increase near-quadratically in long conversations, while the human user contributes only ~1–2% of tokens — the rest is system prompts and history. Here’s how to fight that systematically. Claude Code token saving and speed optimization
Layer 1 — CLAUDE.md Optimization (Biggest Bang for Buck)
Keep your CLAUDE.md under 200 lines. Large files cost tokens every session. Store only crucial facts there.
Use path-scoped rules instead of global ones:
markdown
<!-- .claude/commands/api-rules.md -->
---
paths:
- "src/api/**/*.ts"
---
# API rules
- Validate all request inputs
- Use the standard error response shape
- Add tests for authorization failures
Path-scoped rules load only when Claude edits matching files, hiding irrelevant instructions and reducing cost.
Layer 2 — .claudeignore File
Create a .claudeignore to block Claude from reading unnecessary files:
# .claudeignore
node_modules/
dist/
build/
*.log
*.lock
coverage/
.git/
*.min.js
*.map
__pycache__/
*.pyc
This prevents Claude from reading large auto-generated or dependency files that balloon the context.
Layer 3 — Prompt Caching (Core Speed Mechanism)
Claude Code uses Anthropic’s prompt cache to reuse large, stable chunks of context — system prompt, tool definitions, CLAUDE.md, prior turns — across requests instead of resending and re-billing them every turn. Cache reads are cheaper than uncached tokens.
Rules to protect your cache:
Changing the tool set mid-conversation is one of the most common ways to break prompt caching. Adding or removing a tool invalidates the cache for the entire conversation — because tools are part of the cached prefix. Claude
bash
# DO: Keep tool set stable for the whole session
# DON'T: Add/remove MCP tools mid-session
Subagents start their own conversation with their own system prompt and tool set, separate from the parent’s. They build their own cache from scratch. A fork, by contrast, inherits the parent’s system prompt, tools, and conversation history, so its first request reads the parent’s cache. Claud
Layer 4 — Session Management Commands
Run /compact when the conversation grows, it summarizes past conversation and dramatically reduces context size while preserving it in summary form. Run /compact once after ~50 turns. When a task fully switches, use /clear instead — it wipes history completely. ClaudeCodeLab
Workflow rule:
Same task, long session → /compact
New task entirely → /clear
Check cost anytime → /cost
Layer 5 — Subagent Pattern for Heavy Tasks
For any workflow requiring large amounts of context or multi-step analyses, move it to a subagent pattern. Keep your main session for direction and review. Let the subagent accumulate the heavy context where it doesn’t affect your ongoing work.
bash
# In your CLAUDE.md or prompt:
"Spawn a subagent to read and analyze src/legacy/*.ts
Return only a summary of findings, not raw file contents."
Each subagent returns a summary — only summaries return to main context, saving significant tokens.
Layer 6 — Cap Command Output
Long test logs drain tokens fast. Set the bash output length to 20,000 characters and filter log outputs before Claude sees them,
Add to your settings.json or shell config:
json
// .claude/settings.json
{
"bash": {
"maxOutputLength": 20000
}
}
Or pipe output through a filter:
bash
# Instead of: npm test
# Use:
npm test 2>&1 | tail -n 100
Layer 7 — Model Selection per Task
Use subagents if you need to switch models — for example, deploy a subagent that uses Haiku for exploration tasks, while the main agent uses the full model. Claude Code’s built-in Explore agents use Haiku for this reason. Claude
bash
# For exploration/search tasks → Haiku (cheap, fast)
# For implementation/complex reasoning → Sonnet/Opus
Reported Savings
Reported savings vary: 40%–70% on focused tasks, with higher reductions when context-mode handles heavy MCP use. The more tools you’re running, the larger the impact.
Quick-Start Checklist
| Action | Impact |
|---|---|
Keep CLAUDE.md < 200 lines | High |
Add .claudeignore | High |
Use /compact every ~50 turns | High |
| Never change tools mid-session | High (cache protection) |
| Use subagents for heavy reads | Medium-High |
| Cap bash output to 20k chars | Medium |
| Use Haiku for explore/search | Medium |
| Start new chat for new tasks | High |