12 — lean-ctx Layer
Every tool Claude runs costs tokens — reads, greps, directory listings. A context engineering layer compresses all of that. Same operations, fraction of the context.
Why it matters
In a long session, Claude reads the same files repeatedly. Each read costs tokens. Without caching, reading a 200-line config file ten times costs ten times the tokens. With a caching layer, reads after the first cost almost nothing — a short reference stub replaces the full content.
The same applies to shell output. npm install produces 80 lines. Most of it is noise. A compression layer strips it to what matters before it enters context.
At scale — complex projects, long sessions, multiple agents — this is the highest-leverage infrastructure change you can make.
The four substitutions
A context engineering layer replaces four common operations:
| Instead of | Use | Why |
|---|---|---|
| Read / cat | ctx_read | Session-cached; re-reads are near-free |
| Shell / bash | ctx_shell | Pattern compression strips tool output noise |
| Grep / rg | ctx_search | Compact, token-efficient results |
| ls / find | ctx_tree | Compact directory maps |
Write, Edit, and Delete stay native — the layer only intercepts reads.
Two patterns worth knowing
Cached reads
Read a file once, get a full result. Read it again — same session — get a cheap reference stub. The layer holds the content and returns a pointer.
This matters most for files you reference constantly: CLAUDE.md, config files, shared types. Without caching, every agent and every tool call that touches those files pays full price.
Shell output compression
Tools like git, npm, docker, and tsc produce verbose output. Most of it is formatting, progress bars, and boilerplate. Domain-specific compression patterns strip that before it hits context — you get the signal, not the noise.
A git diff with 300 lines of unified output might compress to 40 lines of actual changes. That's 260 fewer tokens per call, every call.
lean-ctx specifically
lean-ctx is one implementation of this pattern — an MCP server that wraps the four operations above with caching and compression. Search for it on GitHub if you want to try it, or treat it as a spec for a layer you build yourself.
You expose it to Claude via CLAUDE.md with a directive to prefer ctx_read over native Read, and so on. Claude follows it; the layer handles the rest.
You can also build your own version of this pattern using any MCP-compatible server. The concept is tool interception, not the specific implementation.
Is it worth the setup?
For a single short session: probably not.
For any of these: yes.
- Sessions longer than 30-40 exchanges
- Projects where Claude reads the same files repeatedly
- Multi-agent setups where agents share context
- Anywhere you're hitting context limits mid-task
The setup cost is low — one MCP server, a few lines in CLAUDE.md. The ongoing benefit compounds every session.