How does Hypermemory differ from a vector database?

A vector database only does semantic similarity search. Hypermemory combines 5 retrieval strategies — semantic, keyword (BM25), temporal, fact matching, and multi-hop reasoning — and applies temporal supersession to track when facts change over time. This produces far more accurate recall for conversational AI agents.

Does Hypermemory work with LangChain and LlamaIndex?

Yes. Hypermemory integrates with LangChain, LlamaIndex, CrewAI, AutoGen, OpenAI, Anthropic, and the Vercel AI SDK. Integration takes a single import and one line of code.

What is the LoCoMo benchmark?

LoCoMo (Long-Context Memory) is a benchmark for evaluating conversational AI memory systems across five domains: Temporal Reasoning, Open Domain, Inferential, Single Hop, and Multi Hop question answering. Hypermemory achieves state-of-the-art results: 92%, 89%, 87%, 94%, and 88% respectively.

Is Hypermemory open source?

Yes. Hypermemory is MIT licensed and fully open source. Every line of the memory layer is public and auditable.

Can I self-host Hypermemory?

Yes. Hypermemory can be deployed on-premises, in a private cloud, or in an air-gapped environment. The Enterprise plan includes full self-hosting support with Qdrant configuration and production hardening guides.

SOTA on LoCoMo Benchmark

World's First
Human-Like Memory

We've Engineered A Foundational Memory Layer Powering Next-Gen AI World Models

View on GitHub

No credit card required · Free tier available

Works with every agent framework

LangChainLlamaIndexCrewAIAutoGenOpenAIAnthropicVercel AIPydantic AI

pip install hypermemory# one line. any agent.

Hypermemory is a hybrid memory retrieval system for AI agents. It combines semantic search, BM25 keyword matching, temporal scoring, and multi-hop reasoning to give long-running agents persistent, adaptive memory. Unlike context windows that reset, Hypermemory persists facts across sessions and achieves state-of-the-art results on the LoCoMo conversational memory benchmark — scoring 92% on Temporal Reasoning, 94% on Single Hop, and 88% on Multi Hop.

Trusted by developers at

Live Dashboard

Full Visibility Into Every Memory

Search, trace, ingest, and query your memory store. Explore retrieval analytics. Earn XP as you explore.

Memories

0.90

Avg Score

Active

12ms

Avg Latency

Novice

0 XP / 100

0.90

Avg Score

Active

Modes Used

Retrieval Radar

Product launch is April 15th — confirmed by CEO in standup

Mode Usage

Semantic

Temporal

Fact Match

BM25

Most active mode: Semantic

Source Breakdown

Conversation4

Inferred2

Superseded1

24h Activity

live

Memory ingestions over last 24h

Score Distribution

launch_date_april_15

0.97

stripe_webhook_raw_body

0.93

onboarding_variant_b_wins

0.91

user_prefers_nextjs_supabase

0.88

user_timezone_ist

0.86

cors_error_resolved

0.82

Ingested your first memory+50 XP

Ran 3 queries+75 XP

Expanded 5 memory rows+100 XP

+3 more achievements

Architecture

5 Retrieval Modes, Running in Parallel

Every query fans through all strategies simultaneously. Adaptive score fusion returns the best result from whichever path wins.

query dispatched

Semantic SearchVector embeddings via Qdrant

92%

BM25 KeywordFull-text ranking

87%

Temporal ScoringRecency-based ranking

94%

Fact MatchingEntity-attribute-value triples

88%

Multi-Hop ReasoningConnected memory traversal

85%

Score Fusion

Ranked Memory Results

Semantic Search

Finds conceptually similar memories even when exact words differ

BM25 Keyword

Precise recall for exact terms, names, and specific phrases

Temporal Scoring

Weights recent memories higher; detects date-specific queries

Fact Matching

Structured recall of who, what, where from extracted facts

Multi-Hop Reasoning

Chains related memories across topics for complex queries

System Architecture

Four Layers, One Memory System

From raw conversation to persistent, self-refining memory — every component purpose-built for production AI agents.

Ingestion<500ms P95 · lazy async extraction

Conversation Turn

Query q · input

Synchronous Path

Raw Storage + Embed · <500ms P95

Async Worker

EAV Extraction · Episode Segmentation

cache key → Answer

Tripartite StorageThree specialized engines in parallel

PostgreSQL

Facts · Episodes · Graph · Decay Schedules

Qdrant ×4

semantic · facts · episodes · persona vectors

Redis LRU

Sub-millisecond query cache hits

cache hit → Answer

Query Pipeline8 modalities · PRISM 3-agent loop

Query Analyzer

Classify type · adaptive weights wᵢ

Hybrid Retrieval Engine

P(q,d) = Σwᵢrᵢ(q,d) · 8 modalities

PRISM 3-Agent Loop

Decomposer → Selector → Adder · max 3 iter

Ranked results

Hippocampal ReplayOffline · continuous self-refinement

Replay Engine

Cluster · Summarize · Resolve contradictions

User Core Persona

High-priority prefix · injected every query

Synthesizer

Continuous user model refinement

Persistent persona

Ingestion

Tripartite Storage

Query Pipeline

Hippocampal Replay

Memory Retrieval

Hybrid Retrieval Architecture

Hypermemory gives long-running agents persistent, adaptive memory that continuously evolves their intelligence. It drives cost-efficient self-learning, saving developers time, tokens, and money while enabling advanced reasoning across domains like temporal, inferential, and open-world tasks.

Semantic Search

Vector Similarity

What blockers did the team flag in last week's retro?

From the retro on Mar 18: 3 blockers logged — CI pipeline exceeding 22 min, unclear ownership on the auth refactor, and missing design specs for the mobile checkout flow.

Keyword Search

BM25 Ranking

Show all conversations mentioning 'latency regression'

Found 4 conversations: FinexAI support ticket (Mar 3), v2.3.1 post-deploy review (Mar 10), on-call handoff notes (Mar 14), SDK changelog discussion (Mar 21).

Temporal Reasoning

Date-Aware

How long did Dr. Anjali's Phase II clinical trial run?

Based on stored records: protocol approved March 2023, last patient enrolled November 2023. The trial ran for approximately 8 months before moving to data analysis.

Multi-Hop Reasoning

Connected Traversal

Which offices has Rohan visited this quarter?

From multiple memories: Rohan visited Mumbai HQ in January for the Q1 kickoff, attended the Dubai partner summit in February, and is scheduled for the Singapore office in March.

Adversarial

Hallucination-Proof

When did Priya join the backend team?

Not answerable — Priya is stored as a frontend engineer who joined in April 2024. There is no memory of her transferring to the backend team. The question contains an assumption not supported by context.

Inferential

Commonsense + World Knowledge

Would Karan benefit from attending the Rust systems programming workshop?

Very likely — Karan mentioned migrating performance-critical services to Rust last month and has attended 3 systems programming talks this year. This directly matches his current focus.

Data Flow

How Memory Works

Scroll to trace a conversation through each processing phase.

Data Ingestion

API · MCP · SDK

Fact Extraction

Lazy Async

Multi-Modal Index

5 Modes

Query Intelligence

Expand · Filter

Hybrid Retrieval

Score Fusion

Memory Response

Ranked + Proven

1 of 6

Data Ingestion

API · MCP · SDK

Arjun from FinexAI just flagged a prod issue — inference latency spiked to 8 seconds after the v2.3.1 SDK update. They need a hotfix by 5pm or it escalates to their CTO.

Received via API · 0ms ingestion latency

Raw message arrives via API — ingested instantly, zero blocking.

2 of 6

Fact Extraction

Lazy Async

Arjun from FinexAI flagged a prod issue — latency spiked to 8s after the v2.3.1 SDK update. Hotfix needed by 5pm.

Person: ArjunCompany: FinexAIVersion: v2.3.1 SDKDeadline: 5pm

incident(customer="FinexAI", reporter="Arjun", severity="high")

regression(cause="v2.3.1 SDK", metric="latency", value="8s", deadline="5pm")

Named entities extracted, structured facts built asynchronously.

3 of 6

Multi-Modal Index

5 Modes

Semanticindexed

1536-dim vector embedded

BM25 Keywordindexed

Tokenized + inverted index

Temporalindexed

5pm deadline → timestamp scored

Fact Graphindexed

Arjun ↔ FinexAI ↔ v2.3.1 regression

Multi-Hopindexed

Linked to prior FinexAI support history

Memory stored across all 5 retrieval modes simultaneously.

4 of 6

Query Intelligence

Expand · Filter

Original query

What's the status of the FinexAI production issue?

Expanded queries

FinexAI production incident SDK

v2.3.1 latency regression customer

Arjun FinexAI hotfix escalation

⏱ Temporal intent detected → prioritise recency score

Query expanded, temporal hints detected, entity filter applied.

5 of 6

Hybrid Retrieval

Score Fusion

Semantic

94%

BM25 Keyword

89%

Temporal

97%

Fact Match

92%

Multi-Hop

85%

RRF fused score0.93

5 retrieval streams fused via Reciprocal Rank Fusion scoring.

6 of 6

Memory Response

Ranked + Proven

What's the status of the FinexAI production issue?

FinexAI reported a latency spike to 8s after the v2.3.1 SDK update. Arjun flagged this as high priority — hotfix deadline is 5pm today, after which it escalates to their CTO.

Confidence: 97%Sources: 3Latency: 12ms

Ranked, deduplicated, hallucination-checked response delivered.

1 of 6

Multi-Modal Index

5 Modes

Semanticindexed

1536-dim vector embedded

BM25 Keywordindexed

Tokenized + inverted index

Temporalindexed

5pm deadline → timestamp scored

Fact Graphindexed

Arjun ↔ FinexAI ↔ v2.3.1 regression

Multi-Hopindexed

Linked to prior FinexAI support history

Placeholder for height

Scroll to trace the conversation

Data Flow

How Memory Works

Trace a conversation through each processing phase.

1 of 6

Data Ingestion

API · MCP · SDK

Arjun from FinexAI just flagged a prod issue — inference latency spiked to 8 seconds after the v2.3.1 SDK update. They need a hotfix by 5pm or it escalates to their CTO.

Received via API · 0ms ingestion latency

Raw message arrives via API — ingested instantly, zero blocking.

2 of 6

Fact Extraction

Lazy Async

Arjun from FinexAI flagged a prod issue — latency spiked to 8s after the v2.3.1 SDK update. Hotfix needed by 5pm.

Person: ArjunCompany: FinexAIVersion: v2.3.1 SDKDeadline: 5pm

incident(customer="FinexAI", reporter="Arjun", severity="high")

regression(cause="v2.3.1 SDK", metric="latency", value="8s", deadline="5pm")

Named entities extracted, structured facts built asynchronously.

3 of 6

Multi-Modal Index

5 Modes

Semanticindexed

1536-dim vector embedded

BM25 Keywordindexed

Tokenized + inverted index

Temporalindexed

5pm deadline → timestamp scored

Fact Graphindexed

Arjun ↔ FinexAI ↔ v2.3.1 regression

Multi-Hopindexed

Linked to prior FinexAI support history

Memory stored across all 5 retrieval modes simultaneously.

4 of 6

Query Intelligence

Expand · Filter

Original query

What's the status of the FinexAI production issue?

Expanded queries

FinexAI production incident SDK

v2.3.1 latency regression customer

Arjun FinexAI hotfix escalation

⏱ Temporal intent detected → prioritise recency score

Query expanded, temporal hints detected, entity filter applied.

5 of 6

Hybrid Retrieval

Score Fusion

Semantic

94%

BM25 Keyword

89%

Temporal

97%

Fact Match

92%

Multi-Hop

85%

RRF fused score0.93

5 retrieval streams fused via Reciprocal Rank Fusion scoring.

6 of 6

Memory Response

Ranked + Proven

What's the status of the FinexAI production issue?

FinexAI reported a latency spike to 8s after the v2.3.1 SDK update. Arjun flagged this as high priority — hotfix deadline is 5pm today, after which it escalates to their CTO.

Confidence: 97%Sources: 3Latency: 12ms

Ranked, deduplicated, hallucination-checked response delivered.

Developer API

Built for Developers

Simple REST API and MCP integration. Add memory to your AI in minutes.

REST API

Store and retrieve memories with a simple HTTP call

REST API

curl -X POST https://api.hypermemory.run/v1/memories \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "support-agent",
    "content": "Arjun from FinexAI: latency spiked to 8s after v2.3.1 SDK update — hotfix needed by 5pm",
    "metadata": { "source": "slack", "customer": "FinexAI", "priority": "high" }
  }'

MCP Integration

Add Hypermemory as a Model Context Protocol server

MCP Integration

{
  "mcpServers": {
    "hypermemory": {
      "command": "npx",
      "args": ["-y", "hypermemory-mcp"],
      "env": {
        "HYPERMEMORY_API_KEY": "your-api-key",
        "HYPERMEMORY_AGENT_ID": "support-agent"
      }
    }
  }
}

Integration

One-Line Ingestion.
Infinite Recall.

Add persistent memory to your LLM apps with a single function call.

hypermemory_quickstart.pyPython

from hypermemory import Hypermemory

hm = Hypermemory(api_key="your-api-key")

# Store a memory — fact extraction happens automatically
hm.add(
    agent_id="support-agent",
    content="Arjun from FinexAI reported that inference latency spiked to 8 seconds "
            "after the v2.3.1 SDK update. Needs a hotfix by 5pm or escalates to CTO.",
)

# Recall with natural language — multi-modal retrieval kicks in
results = hm.search(
    agent_id="support-agent",
    query="Which customers are affected by the SDK latency regression?",
)
# Returns: relevant memories ranked by semantic similarity,
# temporal recency, and entity-fact matches

PythonTypeScriptREST APIMCP

Open Source

Open Source at Heart

Built in the open. Join thousands of developers building the future of AI memory.

License

MIT Licensed

Use it freely in personal projects, startups, or enterprise products. No strings attached — ever.

Transparency

Fully Auditable

Every line of the memory layer is public. Understand exactly how your data is stored, retrieved, and scored.

Control

Self-Hostable

Deploy on your own infra — on-prem, private cloud, or air-gapped. Zero dependency on our servers.

Community

Shape the Roadmap

Open issues, submit PRs, and vote on features. The community drives what gets built next.

Our commitments

No vendor lock-in

Public issue tracker

Semantic versioning

Changelogs on every release

Reproducible builds

Community RFC process

View on GitHub Join Discord

Performance

LoCoMo Benchmark Results

Hypermemory excels across all LoCoMo evaluation domains.

Temporal Reasoning↑92% vs 61% baseline

Open Domain↑89% vs 58% baseline

Inferential↑87% vs 54% baseline

Single Hop↑94% vs 67% baseline

Multi Hop↑88% vs 52% baseline

Hypermemory

Baseline

Hypermemory vs Baseline on LoCoMo Benchmark
Domain	Hypermemory	Baseline
Temporal Reasoning	92%	61%
Open Domain	89%	58%
Inferential	87%	54%
Single Hop	94%	67%
Multi Hop	88%	52%

Use Cases

AI Memory That Adapts to Your Domain

Hallucination-proof RAG for compliance-critical AI

Patient data retrieval, diagnostics, drug interaction checks

Reduce Readmissions by 40%

Telehealth agents that remember every patient interaction, medication change, and care preference. No more lost context between visits — your AI assistant recalls what matters for better outcomes.

Cut repeat diagnostic workups by 60%
Catch medication conflicts before they happen
Track patient journeys across providers seamlessly

Live example

Use Cases

Built for Every Industry

Hypermemory adapts to your industry — the same retrieval engine, tuned to what matters most in your context.

Healthcare

Agents that remember every patient

Telehealth agents that recall medications, symptoms, allergies, and care preferences across every visit — reducing readmissions and improving outcomes.

40%

fewer readmissions

HIPAA-readyLongitudinal memory

Education

Tutors that adapt to every learner

AI tutors that track each student's learning pace, knowledge gaps, and preferred explanation style — personalizing every session from day one.

3×

faster concept retention

Adaptive learningLong-term context

E-commerce

Shopping assistants with taste memory

Agents that remember what a customer bought, returned, loved, and hated — surfacing the right product before they even search for it.

2.8×

higher conversion

Preference trackingTemporal scoring

Customer Support

Support that never makes you repeat yourself

Agents with full conversation history across channels. Every ticket, refund, and complaint remembered — so customers never have to explain twice.

65%

reduction in handle time

Cross-session memoryMulti-channel

Sales & CRM

AI reps that remember every deal detail

Sales agents that track objections, competitor mentions, stakeholder names, and deal history — delivering hyper-personalised follow-ups that close.

31%

higher close rate

Entity memoryRelationship graphs

Legal & Compliance

Assistants that track regulatory changes

Agents that monitor case law, contract clauses, and compliance requirements over time — with temporal supersession so the current rule always wins.

90%

faster clause retrieval

Temporal supersessionAudit trails

Gaming & Entertainment

NPCs with persistent world memory

Game characters that remember player choices, past interactions, and evolving storylines — creating narratives that feel genuinely alive.

4×

player session length

Narrative memoryCharacter state

Enterprise Ops

Internal agents that know your org

Knowledge agents that remember org charts, project history, team preferences, and institutional knowledge — making every employee 10× more effective.

55%

reduction in search time

On-prem deployAccess control

Enterprise

Secure Memory Layer That Cuts LLM Spend and Passes Audits

SOC 2 Type II ready. Deploy anywhere. Full audit trails.

Zero-Trust Security & Compliance

SOC 2 Type II ready. End-to-end encryption, RBAC, and audit logs for every memory operation.

Deploy Anywhere, No Tradeoffs

On-prem, private cloud, or managed SaaS. Same API, same performance, your infrastructure.

Traceable by Default

Full provenance for every memory. Know where data came from, when it was updated, and who accessed it.

Deployment Options

On-Prem

Private Cloud

Managed SaaS

Contact Sales

From the Blog

Insights on AI Memory

All posts

Research

Why Your AI Agent Forgets Everything — And How to Fix It

Long-running agents break down not because of bad reasoning, but because they can't remember. We explore the root causes of context degradation and the architecture that solves it.

7 min readMar 2026

Read

Engineering

Hybrid Retrieval: Why One Search Strategy Is Never Enough

Semantic search alone misses keywords. BM25 alone misses meaning. Temporal search alone misses context. Here's how fusing all five retrieval modes with RRF produces SOTA results.

9 min readFeb 2026

Product

One-Line Memory for Any LLM Framework

Whether you're on LangChain, LlamaIndex, CrewAI, or raw OpenAI — adding persistent memory to your agent should take minutes, not weeks. Here's how we built that.

5 min readJan 2026

Benchmark

SOTA on LoCoMo: Breaking Down the Benchmark Results

Hypermemory achieves state-of-the-art across all 5 LoCoMo domains. We walk through what each domain tests, where other systems fail, and why our temporal fact engine makes the difference.

11 min readJan 2026

Engineering

Temporal Supersession: Tracking Facts That Change Over Time

"My meeting with Sarah is on Thursday" becomes stale the moment the meeting passes. Here's how Hypermemory's fact graph tracks current state vs. historical state without explicit updates.

8 min readDec 2025

Product

Self-Hosting Hypermemory: A Complete Guide

Run Hypermemory entirely on your own infrastructure — on-prem, private cloud, or air-gapped. This guide covers deployment, Qdrant configuration, and production hardening.

14 min readDec 2025

All posts

Join The Hypermemory Community

Connect with developers building the future of AI memory.

World's FirstHuman-Like Memory

Full Visibility Into Every Memory

5 Retrieval Modes, Running in Parallel

Four Layers, One Memory System

Hybrid Retrieval Architecture

Semantic Search

Keyword Search

Temporal Reasoning

Multi-Hop Reasoning

Adversarial

Inferential

How Memory Works

Data Ingestion

Fact Extraction

Multi-Modal Index

Query Intelligence

Hybrid Retrieval

Memory Response

Multi-Modal Index

How Memory Works

Data Ingestion

Fact Extraction

Multi-Modal Index

Query Intelligence

Hybrid Retrieval

Memory Response

Built for Developers

REST API

MCP Integration

One-Line Ingestion.Infinite Recall.

Open Source at Heart

MIT Licensed

Fully Auditable

Self-Hostable

Shape the Roadmap

LoCoMo Benchmark Results

AI Memory That Adapts to Your Domain

Reduce Readmissions by 40%

Built for Every Industry

Agents that remember every patient

Tutors that adapt to every learner

Shopping assistants with taste memory

Support that never makes you repeat yourself

AI reps that remember every deal detail

Assistants that track regulatory changes

NPCs with persistent world memory

Internal agents that know your org

Secure Memory Layer That Cuts LLM Spend and Passes Audits

Zero-Trust Security & Compliance

Deploy Anywhere, No Tradeoffs

Traceable by Default

Insights on AI Memory

Why Your AI Agent Forgets Everything — And How to Fix It

Hybrid Retrieval: Why One Search Strategy Is Never Enough

One-Line Memory for Any LLM Framework

SOTA on LoCoMo: Breaking Down the Benchmark Results

Temporal Supersession: Tracking Facts That Change Over Time

Self-Hosting Hypermemory: A Complete Guide

Join The Hypermemory Community

World's First
Human-Like Memory

One-Line Ingestion.
Infinite Recall.