AI Architecture#Context Windows #RAG #State #LLM

Your Context Window Is Not a Memory System

Long-context models tempt teams to treat the prompt as a database. That works until you need auditable state, incremental updates, and retrieval that survives a page refresh.

Misha Lubich

April 6, 20262 min read

Your Context Window Is Not a Memory System

The convenience trap

When you can paste thirty pages into a single request, it feels like the model "remembers" your world. In reality it remembers one transcript-shaped snapshot, with no first-class identity for facts, no versioning, and no guarantee the same information will surface on the next turn.

Production features need durable memory: what changed, when, and why. That is a storage and indexing problem, not a token-budget problem.

What actually belongs in context

Use the window for task-local coherence: the current goal, a tight set of constraints, and retrieved evidence that is small enough to verify. Push everything else to explicit stores—vector DB, OLTP, object storage, feature flags—and design APIs so the agent fetches just enough to decide the next action.

Red flags in design reviews

Summarizing the entire product state into a "mega prompt" each call
No schema for user-visible facts versus model-internal scratchpad
Evaluations that only run on canned chats instead of multi-session workflows

If those show up, you do not have architecture—you have a demo with amnesia wearing an expensive disguise.

Takeaway

Context windows are a transport layer for evidence and instructions. Treat them that way, and you stop paying for redundant tokens while your real memory system does the job.

#Context Windows #RAG #State #LLM #Architecture

Back to all posts

AI Architecture2 min1k views

If You Don't Run Evals Before Launch, You Don't Have a Product

The fastest way to lose trust in an AI feature is shipping it with vibes and no evaluation harness. In 2026, release quality is mostly decided before launch day.

April 1, 2026Read more →

AI Architecture4 min1k views

MCP Felt Like Magic on My Laptop. Production Was a Different Animal.

I wired up my first MCP server on a Sunday. By Tuesday I believed I'd solved tool calling forever. A month later I was drawing boxes on a whiteboard about auth, gateways, and who exactly gets sued if the agent deletes the wrong row.

March 29, 2026Read more →

AI Architecture6 min2k views

Is RAG Really Dead in 2026? Not So Fast

Hot takes declared RAG dead. Long-context models were supposed to replace it. But in early 2026, Cursor is shipping RAG pipelines, engineers are still optimizing chunking, and retrieval is evolving — not dying. Here's what's actually happening.

February 18, 2026Read more →