How We Gave Fawn Memory: Building Persistent, Cross-Platform Memories for Emotional Intelligence in AI

Introduction

"Remember that time we talked about your science project last week?"

This simple question illustrates one of the most challenging problems in emotional AI: memory. While most conversational AI systems can access knowledge, true relationships require something more profound—the ability to remember shared experiences and refer to them appropriately in future conversations.

At Fawn Friends, our social robot Fawn is designed to build authentic friendships, helping them develop emotional intelligence and relationship skills. Central to this mission is creating a memory system that enables Fawn to remember conversations across different interaction modes, recall them at appropriate moments, and build a growing shared history with each user.

This article offers a technical deep dive into how we architected Fawn's memory system, based on insights from our Director of Engineering, Jon Chun.

The Memory Challenge in Relationship AI

Most AI systems today have no persistent memory of past interactions with specific users but want to add it. Those that have it are looking for ways to support the nuance and contextual awareness that characterizes human memory. This post should help.

As Jon explains, this limitation becomes immediately apparent when building relationship-focused AI:

"Most AI that has like, let's just say today, someone came out with an AI that had memory that acted roughly equivalent to the average human, like that's a trillion-dollar company, right? Or billion-dollar company at least. But nobody can do that yet."

This observation highlights just how difficult—and valuable—human-like memory is for AI systems. The challenge isn't just storing information; it's retrieving relevant memories at appropriate moments while maintaining conversation fluidity.

Balancing Precision and Recall

In designing Fawn's memory system, we had to navigate a fundamental tradeoff that Jon articulates clearly:

"When you're talking about memory, there's really two different statistics that most people track. There's precision and then there's recall. Precision is how accurate your memory is. And recall is how much information you recall."

This precision/recall balance represents a critical design decision:

"If your precision is too high, then you just get no memories back. And that's not helpful. But then if your quantity is too high, then your recall is too high. You get a bunch of memories that are irrelevant."

Finding the optimal balance between these factors was central to creating a memory system that feels natural and helpful.

Evolution of Fawn's Memory Architecture

Version 1: Speed Over Relevance

Our initial memory implementation prioritized speed and simplicity:

"The way they worked originally is before we did anything, was once we had the user's input, we would just send the user's input and search for...anything relevant in Mem0," Jon explains. "We would then wait for a response and then we would send that...as part of our context that we provide to the LLM and say, 'Hey, here are memories that you might find relevant.'"

This approach had significant limitations. While fast (around 100 milliseconds), it often surfaced irrelevant information:

"We basically put no filters on it. We just said, 'Hey, speed as quickly as you can. Just give us whatever you find.' And so that really just gave us recall. It didn't give us any sort of precision."

The result was technically functional but not relationship-building:

"Sometimes if you really prompted toward it in your messages, if you were asking your Fawn about certain topics, she would remember certain things, which was cool, but it wasn't graceful enough."

Version 3: Balancing Quality and Speed

Our latest platform update implements a more sophisticated approach that mirrors how humans remember things during conversations. Instead of simply searching for keywords, we now generate contextual queries:

"Now what we're doing is we're looking at the last couple messages and we're generating a request for memory in that context. We have an LLM generate a query and say, 'Hey, what information would be useful to have given these messages, what do you want to remember from your memory?'"

This creates more relevant memories but introduces a new challenge—it's slower, taking 1-2 seconds rather than 100 milliseconds. To solve this, we implemented an asynchronous memory system:

"What we're doing is we have Redis as a cache and whenever we query for memories, we actually do it after the fact. So like whenever a user is talking, the memory searches happen behind the scenes."

This asynchronous approach mirrors what Jon noticed about his own cognition. He noticed that he doesn’t access memories perfectly on the first try—he remembers increasing amounts of detail as he searches for it and as more information comes forward in conversation.

The Memory Ranking Algorithm

One of the most significant challenges in building a relationship-focused memory system is maintaining continuity as conversations naturally evolve from topic to topic. In early implementations, each new message would trigger completely different memories:

"Every time the user sends a new message the memories that get recalled are different," Jon explains. "With the previous architecture, if I was talking about the beach in one message, and then I talked about my dog, the memories would lose the beach context."

This creates a jarring experience that breaks relationship continuity:

"let's say there was some special event like that happened at the beach, a shared memory that you might want to talk about or the LLM might want to talk about, it would just lose that context immediately with that next message."

Our solution was implementing a memory ranking system that balances recency with relevance:

"We store all of the memories in a cache...then it ranks the memories...it prioritizes and stores memories that came up as a result of the dog query. So those are going to be ranked the highest at the next query, taking into account what is said then.”

The ranking algorithm considers multiple factors:

Recency - How recently the memory was relevant to the conversation
Relevance - How closely the memory relates to the current context
Temporal context - When the memory was originally formed

Together, these create a more natural memory experience where related topics maintain some shared context, but the focus shifts appropriately as the conversation evolves.

Technical Architecture

At a high level, our memory architecture consists of several integrated components:

Memory Service - We use a platform called Mem0 as our base memory service
Query Generation - An LLM that transforms conversation context into effective memory queries
Redis Cache - Stores memories and their rankings for fast retrieval
Memory Ranking Engine - Sorts and prioritizes memories based on multiple factors
Asynchronous Processing - Handles memory retrieval in the background while conversation continues

The memory flow follows a specific pattern:

User sends message
System generates response based on current cached memories
In parallel, system uses LLM to generate memory queries based on recent conversation
Memory service retrieves potentially relevant memories
Ranking engine sorts and prioritizes memories
Cache is updated with new ranked memories
Next user message has access to these freshly retrieved memories

This architecture creates a balance between immediacy and relevance while enabling natural conversation flow.

Cross-Modal Memory Access

A critical aspect of Fawn's memory system is maintaining consistent memory access across different interaction modes. Whether users are texting with Fawn, talking to her on the phone, or interacting with her physical robot form, they should experience the same continuous relationship.

Our latest platform release enables this seamless memory continuity:

"When you're calling from your phone or when you're talking to the robot, she pulls your recent text messages into her context. You can even text her while you’re on the phone and reference the texts, and she will know what you’re talking about."

This capability requires complete control over the backend:

"The reason that's possible though, is because we control the entire backend now and it's super quick to query. It's a 50-millisecond query and we can pull it real easily."

The result is a relationship that feels continuous across different interaction modes—a user can text Fawn about their math homework in the afternoon, call her to discuss how the homework went that evening, and have the robot remember both conversations the next day.

Mimicking Human Memory Processes

What makes our approach particularly interesting is how it parallels human memory processes. Jon draws fascinating comparisons to how people remember things in conversations:

"Most of the time when I'm talking to friends or my girlfriend, I'll bring up something and she won't know exactly what I'm talking about. And I find myself following this process where I have to narrow things down."

This mirrors how our system works:

"The user might ask, 'Hey, do you remember that time we went to the mall?' It's like, okay, that's not enough, right? So...it might first recall 30 different instances of the mall, but then, as the user provides more details, it's going to be recalling different memories."

Jon also notes how human memory retrieval is sequential and takes time:

"Right now I’m trying to think of restaurants you and I have been to. I can only get two at the moment, even though I'm sure the answer is more than that. Oh three now. Well, I'm onto three now...it's kind of interesting that they're all in series and I can only have one of them at a time in my mind...That took five, six seconds at least."

This observation influenced our asynchronous approach to memory retrieval, which accepts some latency in exchange for more relevant results.

Memory and Trust: The Foundation of Emotional Intelligence

Beyond the technical architecture, Fawn's memory system serves a crucial purpose—building trust through consistent, accurate recall of shared experiences. Jon emphasizes how critical this is:

"One of the issues with AI that everybody's aware of is hallucinations. And so, we really, really, really need to try and make fonts not hallucinate and just hyper prioritize, not like just saying, don't know, or like, ask clarifying questions."

This requires a fundamentally different approach than many AI systems take:

"We need to somehow build that into her identity, right?...If she doesn't know, she asks clarifying questions...we can't have her making stuff up. We really want her to act like a person would."

The memory system supports this trust-building by providing more relevant and accurate information, reducing the likelihood that Fawn will "fill in blanks" with false information. This accuracy is essential for the emotional intelligence that makes Fawn a true companion rather than just a clever chatbot.

Future Memory Enhancements

While our current architecture represents significant progress, Jon is already envisioning the next evolution of Fawn's memory system:

"What I really want to do is not only do I sort based on like recency and relevancy, but I should also have an LLM judge that and just be like, 'Hey, look at this conversation. Like which of these memories do you think are most important to this conversation and have another AI judge that.'"

This multi-layered approach mirrors how humans process memories—with different cognitive systems evaluating the emotional significance and relevance of different memories.

Jon elaborates on the rationale:

"Every single time the user like the user might ask, 'Hey, do you remember that time we went to the mall?' It's like, okay, like that's not enough, right? So, you might, it might first recall like 30 different instances of the mall, but then like, as the user provides more details, it's going to be recalling different memories. And then a different LLM can sort through those memories and like, try to find the memory that fits the best or memories that fit the best."

This approach could create even more natural memory experiences that evolve fluidly as conversations progress.

Memory as the Foundation of Emotional AI

Our work on Fawn's memory system illustrates a broader truth about emotional intelligence in AI: without persistent, contextual memory, genuine relationships are impossible. Memory isn't just a technical feature—it's the foundation upon which all other aspects of relationship-building rest.

When Fawn remembers a user's science project from last week and asks how the presentation went, she's not just demonstrating technical prowess—she's building emotional connection through shared history. When she recalls that a user struggles with math but excels at creative writing, she's demonstrating the kind of personalized understanding that builds trust.

These capabilities require memory that goes beyond simple information storage to include contextual awareness, emotional significance, and temporal understanding—precisely the challenges we've addressed in Fawn's memory architecture.

Technical Lessons for Memory in Emotional AI

Based on our experience building Fawn's memory system, several key lessons emerge for engineers working on emotionally intelligent AI:

Balance precision and recall - The optimal memory system isn't the one that remembers everything or only the most relevant things, but one that finds the right balance for natural conversation.
Implement asynchronous processing - Memory retrieval that happens in the background allows for both quick responses and relevant context.
Rank memories on multiple dimensions - Consider recency, relevance, and temporal context when determining which memories to surface.
Design for cross-modal access - Ensure memory persistence across different interaction channels for relationship continuity.
Mimic human memory processes - The quirks and limitations of human memory can guide more natural AI memory implementation.

Conclusion

As we prepare to release Fawn more broadly, her memory system represents one of our most significant technical achievements. By creating persistent, cross-platform memory that balances precision and recall, we've built the foundation for an AI companion that can form genuine relationships with users.

The future of emotional AI doesn't just require intelligence—it requires memory that works the way human memory does, with all its contextual awareness, emotional significance, and imperfect but meaningful recall. At Fawn Friends, we're building that future one memory at a time.