Every tool Claude runs costs tokens — reads, greps, directory listings. A context engineering layer compresses all of that. Same operations, fraction of the context.

Why it matters

In a long session, Claude reads the same files repeatedly. Each read costs tokens. Without caching, reading a 200-line config file ten times costs ten times the tokens. With a caching layer, reads after the first cost almost nothing — a short reference stub replaces the full content.

The same applies to shell output. npm install produces 80 lines. Most of it is noise. A compression layer strips it to what matters before it enters context.

At scale — complex projects, long sessions, multiple agents — this is the highest-leverage infrastructure change you can make.

The four substitutions

A context engineering layer replaces four common operations:

Instead of	Use	Why
Read / cat	`ctx_read`	Session-cached; re-reads are near-free
Shell / bash	`ctx_shell`	Pattern compression strips tool output noise
Grep / rg	`ctx_search`	Compact, token-efficient results
ls / find	`ctx_tree`	Compact directory maps

Write, Edit, and Delete stay native — the layer only intercepts reads.

Two patterns worth knowing

Cached reads

Read a file once, get a full result. Read it again — same session — get a cheap reference stub. The layer holds the content and returns a pointer.

This matters most for files you reference constantly: CLAUDE.md, config files, shared types. Without caching, every agent and every tool call that touches those files pays full price.

Shell output compression

Tools like git, npm, docker, and tsc produce verbose output. Most of it is formatting, progress bars, and boilerplate. Domain-specific compression patterns strip that before it hits context — you get the signal, not the noise.

A git diff with 300 lines of unified output might compress to 40 lines of actual changes. That's 260 fewer tokens per call, every call.

lean-ctx specifically

lean-ctx is one implementation of this pattern — an MCP server that wraps the four operations above with caching and compression. Search for it on GitHub if you want to try it, or treat it as a spec for a layer you build yourself.

You expose it to Claude via CLAUDE.md with a directive to prefer ctx_read over native Read, and so on. Claude follows it; the layer handles the rest.

You can also build your own version of this pattern using any MCP-compatible server. The concept is tool interception, not the specific implementation.

Is it worth the setup?

For a single short session: probably not.

For any of these: yes.

Sessions longer than 30-40 exchanges
Projects where Claude reads the same files repeatedly
Multi-agent setups where agents share context
Anywhere you're hitting context limits mid-task

The setup cost is low — one MCP server, a few lines in CLAUDE.md. The ongoing benefit compounds every session.

12 — lean-ctx Layer