Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan

Architecting Short-Term and Long-Term Memory in AI Agents

05 Oct 2025

Concept Introduction

AI agents require structured memory to operate effectively across turns and sessions. Without it, every interaction starts from scratch, and the agent cannot learn from experience or maintain coherent context.

Agent memory falls into two broad categories:

  1. Short-Term (or Working) Memory: This is the agent’s “scratchpad.” It’s the context of the current conversation, the results of recent tool calls, and the immediate plan of action. It’s volatile and limited.
  2. Long-Term Memory: This is the agent’s persistent knowledge store. It’s where the agent saves key facts, past conversations, user preferences, and learned insights. It’s vast and durable.

Architecting the flow of information between these memory systems is one of the most critical aspects of designing an intelligent agent.

Algorithms & Mechanics: The Memory Pipeline

Information flows from the environment through the agent’s memory systems in a structured way.

graph TD
    A[Input from User/Tool] --> B(Sensory Memory);
    B --> C{Short-Term Memory};
    C -- To LLM --> D[LLM Context Window];
    D -- From LLM --> C;
    C -- Consolidation --> E{Long-Term Memory};
    E -- Retrieval --> C;
  

Design Patterns & Architectures

Practical Application

Here is a simplified Python class that sketches out a combined memory system.

from collections import deque
# Assume we have these functions from a vector DB library
# from vector_db import get_embedding, store_vector, search_vectors

class AgentMemory:
    def __init__(self, short_term_k=10):
        # Short-term memory: keep the last k interactions
        self.short_term_buffer = deque(maxlen=short_term_k)

    def add_interaction(self, user_input, agent_response):
        """Adds a user/agent turn to short-term memory."""
        self.short_term_buffer.append({"user": user_input, "agent": agent_response})

    def get_context(self):
        """Formats the short-term memory for the LLM prompt."""
        return list(self.short_term_buffer)

    def store_fact(self, text):
        """Embeds a fact and stores it in long-term memory."""
        print(f"MEMORY: Storing fact -> '{text}'")
        # vector = get_embedding(text)
        # store_vector(vector, text)
        pass # Simulate storing

    def retrieve_relevant_facts(self, query, top_n=3):
        """Searches long-term memory for facts relevant to a query."""
        print(f"MEMORY: Searching for facts related to -> '{query}'")
        # query_vector = get_embedding(query)
        # results = search_vectors(query_vector, top_n)
        # return results
        # Simulate finding a relevant memory
        return ["Fact: The user's favorite color is blue."]

# --- Usage ---
memory = AgentMemory()
memory.add_interaction("Hi there!", "Hello! How can I help you?")
memory.add_interaction("What's my favorite color?", "You told me earlier it's blue.")

# A "reflective" moment
memory.store_fact("The user's favorite color is blue.")

# Later, in a new session...
context = memory.get_context() # Would be empty in a new session
relevant_memories = memory.retrieve_relevant_facts("Do you remember my preferences?")
# The agent would then add relevant_memories to its prompt context.
print(relevant_memories)

Frameworks like LangChain and LlamaIndex offer powerful, pre-built Memory modules that handle various buffering strategies and vector store integrations automatically.

Latest Developments & Research

Cross-Disciplinary Insight

The architecture of agent memory systems is a direct and intentional parallel to the Atkinson–Shiffrin memory model (1968), a foundational theory in Cognitive Psychology. This model proposes that human memory is split into three components:

  1. Sensory Memory: A brief buffer for sensory inputs (sights, sounds).
  2. Short-Term Memory: A limited-capacity store for active information.
  3. Long-Term Memory: A vast, durable store for knowledge and experiences.

Information flows from sensory to short-term memory. Through processes like rehearsal and elaboration, it can be encoded into long-term memory. When needed, it’s retrieved from long-term back into short-term memory to be used. The design of agent memory systems is a direct, intentional parallel to this model.