Claude Code: Token Saving & Speed Optimization — Implementation Guide

Posted on May 26, 2026 May 27, 2026 • by Ruman Ahmed • 4 min read

The core insight is this: Claude is stateless — every time you send a message, the entire conversation history gets re-transmitted to the model. And token costs increase near-quadratically in long conversations, while the human user contributes only ~1–2% of tokens — the rest is system prompts and history. Here’s how to fight that systematically. Claude Code token saving and speed optimization

Layer 1 — CLAUDE.md Optimization (Biggest Bang for Buck)

Keep your CLAUDE.md under 200 lines. Large files cost tokens every session. Store only crucial facts there.

Use path-scoped rules instead of global ones:

markdown

<!-- .claude/commands/api-rules.md -->
---
paths:
  - "src/api/**/*.ts"
---
# API rules
- Validate all request inputs
- Use the standard error response shape
- Add tests for authorization failures

Path-scoped rules load only when Claude edits matching files, hiding irrelevant instructions and reducing cost.

Layer 2 — `.claudeignore` File

Create a .claudeignore to block Claude from reading unnecessary files:

# .claudeignore
node_modules/
dist/
build/
*.log
*.lock
coverage/
.git/
*.min.js
*.map
__pycache__/
*.pyc

This prevents Claude from reading large auto-generated or dependency files that balloon the context.

Layer 3 — Prompt Caching (Core Speed Mechanism)

Claude Code uses Anthropic’s prompt cache to reuse large, stable chunks of context — system prompt, tool definitions, CLAUDE.md, prior turns — across requests instead of resending and re-billing them every turn. Cache reads are cheaper than uncached tokens.

Rules to protect your cache:

Changing the tool set mid-conversation is one of the most common ways to break prompt caching. Adding or removing a tool invalidates the cache for the entire conversation — because tools are part of the cached prefix. Claude

bash

# DO: Keep tool set stable for the whole session
# DON'T: Add/remove MCP tools mid-session

Subagents start their own conversation with their own system prompt and tool set, separate from the parent’s. They build their own cache from scratch. A fork, by contrast, inherits the parent’s system prompt, tools, and conversation history, so its first request reads the parent’s cache. Claud

Layer 4 — Session Management Commands

Run /compact when the conversation grows, it summarizes past conversation and dramatically reduces context size while preserving it in summary form. Run /compact once after ~50 turns. When a task fully switches, use /clear instead — it wipes history completely. ClaudeCodeLab

Workflow rule:

Same task, long session  → /compact
New task entirely        → /clear
Check cost anytime       → /cost

Layer 5 — Subagent Pattern for Heavy Tasks

For any workflow requiring large amounts of context or multi-step analyses, move it to a subagent pattern. Keep your main session for direction and review. Let the subagent accumulate the heavy context where it doesn’t affect your ongoing work.

bash

# In your CLAUDE.md or prompt:
"Spawn a subagent to read and analyze src/legacy/*.ts
 Return only a summary of findings, not raw file contents."

Each subagent returns a summary — only summaries return to main context, saving significant tokens.

Layer 6 — Cap Command Output

Long test logs drain tokens fast. Set the bash output length to 20,000 characters and filter log outputs before Claude sees them,

Add to your settings.json or shell config:

json

// .claude/settings.json
{
  "bash": {
    "maxOutputLength": 20000
  }
}

Or pipe output through a filter:

bash

# Instead of: npm test
# Use:
npm test 2>&1 | tail -n 100

Layer 7 — Model Selection per Task

Use subagents if you need to switch models — for example, deploy a subagent that uses Haiku for exploration tasks, while the main agent uses the full model. Claude Code’s built-in Explore agents use Haiku for this reason. Claude

bash

# For exploration/search tasks → Haiku (cheap, fast)
# For implementation/complex reasoning → Sonnet/Opus

Reported Savings

Reported savings vary: 40%–70% on focused tasks, with higher reductions when context-mode handles heavy MCP use. The more tools you’re running, the larger the impact.

Quick-Start Checklist

Action	Impact
Keep `CLAUDE.md` < 200 lines	High
Add `.claudeignore`	High
Use `/compact` every ~50 turns	High
Never change tools mid-session	High (cache protection)
Use subagents for heavy reads	Medium-High
Cap bash output to 20k chars	Medium
Use Haiku for explore/search	Medium
Start new chat for new tasks	High

Claude Code: Token Saving & Speed Optimization — Implementation Guide

Layer 1 — CLAUDE.md Optimization (Biggest Bang for Buck)

Layer 2 — `.claudeignore` File

Layer 3 — Prompt Caching (Core Speed Mechanism)

Layer 4 — Session Management Commands

Layer 5 — Subagent Pattern for Heavy Tasks

Layer 6 — Cap Command Output

Layer 7 — Model Selection per Task

Reported Savings

Quick-Start Checklist

Subscribed to get insights

CODE Makes Dream True..

Engineering Meets Creativity..

Layer 1 — CLAUDE.md Optimization (Biggest Bang for Buck)

Layer 2 — .claudeignore File

Layer 3 — Prompt Caching (Core Speed Mechanism)

Layer 4 — Session Management Commands

Layer 5 — Subagent Pattern for Heavy Tasks

Layer 6 — Cap Command Output

Layer 7 — Model Selection per Task

Reported Savings

Quick-Start Checklist

Layer 2 — `.claudeignore` File