Agent Memory

Most AI interactions are stateless — the model forgets everything the moment the conversation ends. Agent memory changes that. It gives AI systems the ability to remember user preferences, past interactions, and learned facts across sessions. This is what makes an AI assistant feel like it actually knows you.

Types of Memory

Memory is not one thing. Different types of memory serve different purposes, operate on different timescales, and require different infrastructure.

Conversation history

The simplest form of memory: what was said earlier in this conversation. Every chatbot does this within a session, but the challenge is managing it as conversations grow long. Raw history fills the context window fast. Summarisation, selective retention, and sliding window approaches keep history useful without overwhelming the model.

User preferences

What the user likes, how they want responses formatted, what topics they care about, what language they prefer. These are facts about the user that persist across conversations. A system that remembers you prefer concise answers, or that you always want code examples in Python, provides a fundamentally better experience.

Learned facts

Information the agent picks up through interaction. A user mentions they work at a specific company, or that they are preparing for a board presentation next week. These facts get stored and surfaced when relevant — without the user having to repeat themselves. This is what makes AI feel intelligent rather than amnesiac.

Episodic memory

Records of past interactions — not the raw transcript, but structured summaries of what happened, what was decided, and what the outcome was. Episodic memory lets an agent reference previous encounters meaningfully. "Last time we discussed your migration plan, you decided to start with the customer database. How is that going?"

Short-Term vs Long-Term Memory

The distinction between short-term and long-term memory maps directly to how information is stored and retrieved in practice.

Short-term memory

This is what lives in the context window during a single interaction. The current conversation, recently retrieved documents, the active system prompt. Short-term memory is fast and immediate but disappears when the session ends. The main challenge is managing its size — context windows are large but not infinite, and stuffing them with too much history degrades output quality.

Long-term memory

This is what persists across sessions. User preferences, learned facts, episodic summaries — all stored externally in a database, vector store, or structured knowledge graph. At the start of each interaction, the system retrieves relevant long-term memories and loads them into the context. The challenge here is deciding what to remember, what to forget, and what to surface when.

In practice, I design memory systems with explicit policies for each type. Not everything should be remembered. Some information is ephemeral — relevant only to the current task. Some is sensitive — user data that should be handled carefully. And some is high-value — preferences and facts that compound in usefulness over time. A good memory system distinguishes between these categories and handles each appropriately.

How Memory Differs from RAG

Memory and RAG both inject context into the prompt, but they serve fundamentally different purposes. Confusing them leads to systems that do neither well.

RAG is institutional knowledge

RAG retrieves from a shared knowledge base — documents, wikis, policies, product information. This knowledge exists independently of any individual user. It is the same for everyone. RAG answers the question "what does the organisation know about this topic?"

Memory is personal knowledge

Memory stores what the agent has learned about a specific user through interaction. It is different for every user. Memory answers the question "what do I know about this person that makes my response more relevant?" A RAG system and a memory system can return completely different context for the same query — and both are useful.

Different update patterns

RAG knowledge changes when documents are updated — a deliberate, batch process. Memory updates continuously through interaction — the agent learns something new every time a user corrects it, states a preference, or provides information. Memory is a living system. RAG is a curated collection.

Different privacy implications

RAG content is typically shared across users. Memory is inherently personal. This has direct privacy implications — what the agent remembers about one user must not leak to another. Memory systems need user isolation, data retention policies, and the ability for users to inspect and delete what the system knows about them.

Design Considerations

Building a memory system that works in production means answering questions that go beyond the technical implementation.

What to remember

Not everything should be stored. A good memory system has explicit criteria for what gets promoted from short-term to long-term memory. Repeated preferences, corrected facts, and explicitly stated information are high-value. Casual conversation and one-off context are typically not worth persisting.

When to forget

Memory decay is a feature, not a bug. Information that was relevant six months ago may be misleading today. Memory systems need expiration policies, relevance scoring, and the ability to deprecate outdated memories. A system that remembers everything is not smarter — it is noisier.

How to surface

Having memories is useless if they are not surfaced at the right time. The retrieval mechanism for memory needs to be contextually aware — pulling up preferences when the user makes a request, surfacing episodic summaries when a past topic comes up, and staying quiet when a memory is not relevant to the current task.

User control

Users should be able to see what the agent remembers about them, correct inaccuracies, and delete memories. This is both a trust issue and, increasingly, a regulatory requirement. Memory transparency builds user confidence and prevents the uncomfortable feeling of being surveilled by a system meant to help.

Where It Fits

Memory is the context layer that makes AI systems personal. It connects to nearly every other building block in the stack.

RAG

Memory and RAG work in parallel. RAG provides institutional knowledge. Memory provides personal knowledge. The best systems merge both at query time — retrieving relevant documents and relevant memories, then letting the model synthesise them into a response that is both accurate and personalised.

Storage

Memory needs a persistence layer. Short-term memory lives in the context window. Long-term memory lives in a database — often a combination of structured storage (for user preferences and facts) and vector storage (for semantic retrieval of episodic memories). The storage architecture determines how fast and how accurately memories can be retrieved.

Agents

Agents are the primary beneficiaries of memory. An agent with memory can maintain continuity across interactions, avoid asking the same questions twice, and adapt its behaviour based on what it has learned about the user. Memory is what turns a stateless tool into a persistent assistant.

Safety and privacy

Memory creates new privacy responsibilities. What is stored, how long it is kept, who can access it, and how it can be deleted — these are safety questions as much as engineering ones. I design memory systems with privacy built in from the start, not bolted on after deployment.

Want AI that remembers?

I design memory systems that make AI assistants genuinely useful over time — with the right retention policies, retrieval mechanisms, and privacy controls. If your AI feels like it has amnesia, memory is the fix.

Get in Touch Back to Context