When developers first add memory to an AI agent, they almost always reach for a vector database. The pattern is intuitive: embed the user's message, search for similar past messages, inject the results into the prompt. It works — until it doesn't.
The failure modes are predictable. A user asks 'What was my API rate limit again?' — a keyword-heavy query. Vector search returns semantically similar passages about 'API configuration' but misses the exact number. A user asks 'Did we decide to postpone the launch?' — a temporal query. Vector search returns the original launch discussion from three weeks ago, not the postponement decision made yesterday. A user asks 'Who introduced Alice to the team?' — a relational query. Vector search can't follow entity relationships.
Each of these failure modes has a known solution. Keyword queries → BM25. Temporal queries → recency-weighted scoring. Relational queries → graph traversal. The problem is that you don't know which type of query you'll receive at runtime. The solution is to run all of them.
Hypermemory's retrieval pipeline executes five strategies in parallel for every query: Semantic Search (vector embeddings via Qdrant) for conceptual similarity, BM25 keyword ranking for exact term matching, Temporal Scoring that multiplies relevance scores by recency weight to surface recent facts, Temporal Fact Search that extracts and indexes event dates for 'When did X happen?' queries, and Multi-hop Reasoning that follows relationship chains in the memory graph.
The results are fused using Reciprocal Rank Fusion (RRF), a rank aggregation algorithm that combines ordered result lists without requiring score normalization. RRF is particularly well-suited here because the five retrievers produce scores on different scales — cosine similarity, BM25 TF-IDF, and raw recency weights are not directly comparable. RRF sidesteps this by working with ranks, not scores.
The performance difference is substantial. On the LoCoMo benchmark, single-strategy retrieval baselines score 52–67% depending on the domain. Hypermemory's fusion approach scores 87–94% across all five domains. The 27–36 percentage point improvement is not marginal — it's the difference between a memory system that works in demos and one that works in production.
The implementation is straightforward. Each retrieval strategy runs as an async function; results are merged with RRF; the top-k merged results are returned to the agent. The total latency is bounded by the slowest retriever (typically the vector search), not the sum of all retrievers. On average Hypermemory returns results in under 50ms for collections under 100K memories.
