4 Types of Memory Every Agent Needs
How agents actually remember, from first principles
Before this one, you may want to read our earlier piece, Memory in AI. This article goes one level deeper, into how AI agent memory actually works.
Everyone is building AI agents now.
Almost no one is building their memory.
That is the real gap. Not reasoning. Not tools. Memory.
And memory is not one thing. It is four. Most teams build one of them, call it done, and ship an agent that forgets in three other ways.
This article breaks AI agent memory down from first principles.
We will build one real agent, derive a simple equation for what makes it work,
and visualise the architecture so you can actually see it.
By the end you will be able to look at any forgetful agent and name exactly which memory is broken.
Let us start where the pain starts.
You Are a PM at Booking.com
Imagine you are a product manager at Booking.com.
Your job this quarter is to ship an AI travel agent. Not a chatbot that answers FAQs. A real agent.
The user types one line, “plan me a 4-day trip to Goa under 40 thousand rupees,” and the agent searches hotels, checks availability, builds an itinerary, holds a room, and completes the booking.
You demo it to leadership. It works. Everyone claps.
Then real users touch it.
A user says Plan me a Goa trip under 40 thousand.
Five messages later, the agent suggests a hotel for 55 thousand. It forgot the budget.
A returning user who has booked with you eleven times gets treated like a stranger. No memory of the beach resort she loved or the airport hotel she hated.
The agent quotes a cancellation policy that changed last month. Confidently. To a paying customer.
And once, it booked the same room twice because it lost track of which step it was on.
The model is fine. It is one of the best models in the world.
The problem is not intelligence. The problem is memory.
Why You, The PM, Have To Understand This
You might think this is an engineering problem. Smart agent, smart engineers, they will sort it out. Can the agent not just figure it out on its own?
No. And here is why.
Your engineers will build exactly what you spec.
If you write “the agent should remember the user,” they will ask Remember what, for how long, retrieved how, and you will not have an answer.
So they pick one kind of memory, ship it, and you get an agent that forgets in three other ways.
Memory is not one feature you can hand off. It is a set of product decisions only you can make.
Which things the agent holds in the moment.
Which facts it looks up.
Which workflows it follows.
Which past events it carries forward.
If you do not understand the four memories, you cannot spec them. And an agent you cannot spec is an agent that ships broken.
So let us break it down properly. From first principles.
The Agent Equation
Start by asking what an agent even is, mathematically.
An agent does three things. It thinks. It remembers. It acts.
Strip those down, and you get a simple equation.
Agent = Reasoning × Memory × Action
—> Reasoning is the model. The planning and the thinking.
—> Action is the tools. The ability to do things in the world.
Memory is everything it can recall while it reasons and acts.
Now look closely at why these are multiplied, not added.
Multiplication means dependency. If any term drops near zero, the whole agent drops near zero. You cannot buy back a missing factor by scaling another one.
This is why throwing a bigger model at a forgetful agent does nothing. You are scaling Reasoning when the bottleneck is Memory.
Now zoom into the Memory term, because that is where the four types live.
Memory = Working × Semantic × Procedural × Episodic
Same logic. These four are multiplied, not added, because each one covers a failure the others cannot fix. More semantic memory does not fix a missing episodic memory. A bigger context window does not fix a broken workflow. The weakest of the four caps the whole agent.
That is the entire thesis in one line. Now let us derive the four from first principles, using the one system you already trust. Your own brain.
Before we move further, you can find out our flagship
AI PM Course (PMs at Microsoft, Coinbase, Indeed & 600+ PMs rated 4.9/ 5).
See testimonials and course details — Extra 60% OFF - Use Code NYE26
Start With Your Own Brain
Before we touch the agent, look at your own head.
You are holding a phone number right now because someone said it ten seconds ago. That is one kind of memory.
You know Goa is in India. You did not look it up. That is a different kind.
You can drive a car, but you cannot explain how you balance the clutch. Different again.
You remember your last trip. The hotel. The bad breakfast. The view. Different again.
Knowing a fact is not the same as remembering an event. Knowing that something is true is not the same as knowing how to do it.
Your brain runs four memories at once. Your agent needs the same four. Working memory. Semantic memory. Procedural memory. Episodic memory.
Every failure in your Booking.com demo was one of these four breaking. Here is the full stack, then we take them one at a time.
1. Working Memory. What the agent is holding right now.
This is the phone number in your head.
For the agent, it is the context window. Every token in the current conversation. The user’s request, the last few messages, the search results it just pulled, the budget you set.
This is the memory that failed when the agent forgot the 40 thousand budget.
The budget was mentioned in message one. By message six, the conversation had filled with hotel options, dates, and itinerary details. The budget scrolled out of the agent’s working view. It cannot act on what it can no longer see.
There is a sharper failure hiding here. Even when the budget is technically still in the window, models recall the start and the end of a long context better than the middle. Bury the budget in the middle of a long chat and the agent quietly loses it. The information is there. The attention is not.
So when a vendor brags about a one-million-token context window, ask the real question. Not how much can it hold. How much can it actually use.
Use case where it is required. Any multi turn task. Trip planning, negotiation, troubleshooting. Anywhere the request evolves over several messages and the agent must hold the running state.
When working memory breaks, the agent contradicts itself and forgets what you told it two minutes ago.
The fix is not a database. It is managing what goes into the window, and pinning the constraints where the model will actually see them.
2. Semantic Memory. What the agent knows.
This is knowing Goa is in India.
Stable facts. For your agent, this is everything it needs to know about your world. Hotel details. Room types. Cancellation policies. Visa rules. The user’s loyalty tier. City guides.
None of this lives inside the model. None of it should sit in the context window all the time. It lives in a separate store, and the agent pulls in the right facts only when it needs them.
This is the memory that failed when the agent quoted the wrong cancellation policy.
In practice, this is RAG. A knowledge base, usually a vector database, that the agent searches before it answers. The policy changed last month. The knowledge base still had the old version. The agent retrieved a fact that used to be true and stated it with full confidence.
Use case where it is required. Any agent that must be correct about your specific business. Support, sales, compliance, booking. The moment the agent needs to know something the base model was never trained on, you need semantic memory.
When semantic memory breaks, the agent is fluent and wrong. The model is not the problem. Your knowledge base is.
3. Procedural Memory. How the agent acts.
This is knowing how to drive without relearning it every time.
For the agent, procedural memory is the learned workflow. The exact steps to complete a booking. Search, check live availability, place a temporary hold, confirm with the user, take payment, send confirmation, release the hold if it fails.
This is not a fact to look up. It is a sequence to follow, the same way, every time.
This is the memory that failed when the agent double-booked a room. It lost track of the step it was on. It confirmed before it held. The workflow broke.
In practice, you encode procedural memory in system prompts, rules files, and skill definitions. The booking sequence is written down so the agent does not improvise it on every run.
When procedural memory is missing, the agent is unpredictable. Same task, different path every time. In a booking flow, that is not a quirk. That is a refund and an angry customer.
4. Episodic Memory. What the agent has lived through.
A specific event, tied to a time, that happened. This is the hardest memory to build, and the one almost every product is missing.
For the agent, episodic memory is recalling specific past interactions with this user. Not a general fact. The actual event.
Last June this user booked a beach resort and complained it was 40 minutes from the station. In March, she cancelled because the hotel had no airport pickup. She always books twin beds.
This is the memory that failed when your loyal eleven-time customer got treated like a stranger.
This is the difference between an assistant that helps you and an assistant that knows you. It is also what lets the agent get better over time instead of starting from zero on every trip.
Be honest about where the industry is. The memory features you have seen, an assistant that remembers you prefer beach hotels, mostly store facts about you. That is semantic memory wearing an episodic coat. True episodic memory, recalling the specific arc of past trips and reasoning from it, is still rare and still hard.
Under the hood it usually works like this. Every interaction is written to a store as an event, with a timestamp and an embedding. When a new request comes in, the agent retrieves the most relevant and most recent past events and pulls them into the context window. Which means episodic memory only works if semantic retrieval already works, and only delivers value if working memory has room to hold what it retrieves. The four are not independent. They stack.
When episodic memory is missing, the agent never learns from the user and makes the same mistake every visit.
How The Four Stack In One Real Request
Here is the part most people get wrong. They think these four are separate features you can pick from a menu.
They are not separate. They stack.
What happens when your returning user types “book me the usual.”
Episodic memory recalls that the usual means a twin room near the beach. Semantic memory pulls the current price and the live cancellation policy. Procedural memory runs the booking steps in the right order. Working memory holds the whole thing together while it happens.
Pull out any one, and the request fails. A real agent needs all four, wired together.
Multi-Agent Memory. The Part Teams Get Wrong At Scale.
One agent is the easy case.
The moment your Booking.com system grows up, it becomes several agents. A supervisor that plans the trip. A search agent. A booking agent. An itinerary agent. This is a multi-agent architecture, and it breaks memory in a brand new way.
The question is no longer what memory does the agent need. It is which memory is private, and which memory is shared.
Working memory is private. Each agent has its own context window for the task at hand. The search agent should not be cluttered with the itinerary agent’s running state.
Procedural memory is private. Each agent has its own skills. The booking agent knows the payment sequence. The search agent does not need it.
Semantic memory must be shared. If the search agent and the booking agent read different price lists, they will quote different numbers and the booking will fail.
Episodic memory must be shared. This is the one teams miss. If the search agent remembers the user hates airport hotels but the itinerary agent does not, the system contradicts itself and the user notices instantly.
The rule is simple. Private memory makes each agent focused. Shared memory makes the system coherent. Get the split wrong, and your agents argue with each other in front of the customer.
AI PM Interview Questions On Agent Memory
If you are interviewing for an AI PM role, agent memory is now a core system design topic. Here are the questions you should be ready for, with what a strong answer covers.
You are building a travel booking agent. Users complain it forgets their budget mid-conversation. Which memory system is failing and how do you fix it?
Your support agent confidently quotes a refund policy that changed last month. Diagnose it.
Your team wants to fix a forgetful agent by moving to a one million token context window. What is wrong with that instinct?
In a multi-agent system, which memory should be shared across agents and which should stay private?
About Author
Shailesh Sharma! I help PMs and business leaders excel in Product, Strategy, and AI using First Principles Thinking. Weekly Live Webinars/MasterClass (Here)
More Resources
Product Management Mock Interview (Detailed)
Crack AI Business Roles (AI Management Consulting, AI Category Management, AI General Manager, Revenue Planning, etc.) - Course Details
Crack AI Program Manager Roles - Course Details
We are launching a 12-week Cohort ( 100 Hrs of Learning) and 10+ Live Projects. Please find the details here






