( ● v0.8.3 — token-budget navigation · match_confidence calibrated · provenance-gated · MCP-ready )

Memory that learns
from what actually worked.

Single-plan fusion across vector, BM25, graph, and metadata — inside your existing Postgres. Zero LLM cost per write. Confidence rises when memory actually helped. Every score is inspectable in plain SQL, and every write must prove its source.

Install in 60 seconds → View on GitHub ↗

LongMemEval-S recall@10 = 0.9604 — reproducible benchmarks →

Provenance gate Hybrid recall Multi-tenant Postgres-native

27 ms p50 recall·$0 per write — no LLM at ingest·100% in your Postgres·Apache-2.0 · zero egress

Don't take our word for it — mark an outcome and watch recall re-rank itself, in SQL.

the feedback loop, live1 recall2 reinforce3 recall again

-- ranking is an inspectable SQL column, not a black box
SELECT content, match_confidence FROM pgmnemo.recall_hybrid(query_text => 'how to avoid a retry storm') LIMIT 3;

Mark an outcome — pgmnemo records it with one reinforce() call, and recall re-ranks itself. Every score is a column you can read.

Release highlights

Your memory learns which lessons actually worked.

v0.8.0 adds token-budget navigation (navigate_locate / navigate_expand) and in-place maintenance (reembed / recompute_content). v0.7.0 added the outcome-learning loop: every lesson carries a confidence score [0,1] that reinforce() updates from each run’s outcome (+0.10 success / −0.15 failure). v0.6.1 ships as_of_ts point-in-time recall and stress-test benchmarks. v0.6.2 ships sparse-safe RRF — +1.13 pp recall@10 on LongMemEval-S (p=0.017). v0.6.3 fixes the AmbiguousColumn regression that blocked production use. LME-S recall@10 = 0.9334 (v0.5.1 embedder baseline, unchanged since v0.4.1).

// new · v0.6.2 · Fix-A

RRF ranking

Vector scores were 0.4–0.9. BM25 scores were 0.005–0.05. The old weighted sum silently ignored BM25. Now we fuse by rank, not by score. Confirmed +1.13 pp recall@10 on LongMemEval-S (paired-t p=0.017, bge-m3 1024d, N=500). Shipped in v0.6.2.

-- v0.5.1 weighted sum (BM25 ignored) -- v0.6.0: rank fusion SELECT * FROM pgmnemo.recall_lessons( query_text := 'JWT rotation' ); -- +1.13 pp recall@10 confirmed (v0.6.2)

// v0.6.1

ghost_count metric

pgmnemo.stats() gains a ghost_count column — lessons with no commit_sha and no artifact_hash. Alert when it exceeds your threshold, then flip gate_strict on.

SELECT ghost_count FROM pgmnemo.stats(); ghost_count ----------- 3

// v0.6.1

dedup NOTICE

When the bitemporal trigger closes+creates a row on a duplicate content_hash, ingest() now emits NOTICE: bitemporal close+create fired — closed N prior version(s). Parses out of psycopg2 connection.notices.

# psycopg2 pgmnemo.ingest(p_role='dev', p_project_id=1, p_topic='auth', p_lesson_text='…') print(conn.notices) # ['NOTICE: bitemporal close+create fired — closed N prior version(s)']

No breaking changes. Run ALTER EXTENSION pgmnemo UPDATE TO '0.8.3'; — safe to rerun.

Full changelog ↗

What is pgmnemo

Memory for AI agents, without the noise.

Lesson. A structured memory record that an AI agent writes to storage — for example «JWT tokens rotate after key compromise» — annotated with the artifact that validated it (commit SHA, file hash, or passing test ID). The full noun pgmnemo operates on.

The problem. AI agents hallucinate. Today's memory systems store whatever the agent says — including hallucinated summaries, made-up commit IDs, and confident-but-wrong conclusions. Three weeks later, a different agent reads that bad lesson and builds on it. The mistake compounds across runs.

The fix. pgmnemo blocks that pattern with a provenance gate. Before any lesson is promoted to long-term memory, it must be attached to a verifiable artifact: a git commit_sha, a file hash, or a passing test ID. If no artifact exists, the lesson enters a staging queue — visible to queries but unflagged, and not surfaced to other agents. It promotes to canonical memory the moment an artifact is attached (commit SHA, file hash, test ID); until then it stays out of the trusted recall path.

Where it lives. Not a separate service. Not a SaaS API. CREATE EXTENSION pgmnemo; in your existing PostgreSQL and you're done. The gate runs as a database-level row policy — application code cannot bypass it. Your data never leaves your server.

How it works

Three SQL calls. That's the whole pipeline.

// step 01

Agent finishes a task

It produces an artifact — a git commit, a saved file, a passing test. That artifact is the proof the work is real.

# Python sha = git.commit("fix JWT") # sha = 'abc1234'

// step 02

pgmnemo verifies and stores

ingest() requires commit_sha or artifact_hash. Without it the write is blocked at the database layer — not by your app.

SELECT pgmnemo.ingest( p_lesson_text := 'JWT…', p_commit_sha := 'abc1234' );

// step 03

Future agents recall it

recall_lessons() returns relevant past lessons using hybrid scoring: HNSW vectors + BM25 + recency + importance.

SELECT * FROM pgmnemo.recall_lessons( query_text := 'JWT rotation' ) ORDER BY score DESC;

Who it's for

You probably already have the stack pgmnemo needs.

If three of these describe your project, pgmnemo replaces ~200 lines of ad-hoc memory code with two SQL function calls.

You run a multi-agent pipeline (research → write → review, plan → code → test) and each run starts from zero context.

Your stack already has PostgreSQL + pgvector on Supabase, Neon, or self-hosted. You don't want to add a separate memory service.

You've watched an agent hallucinate a summary and seen that wrong summary poison the next run.

Your data has residency or sovereignty constraints — it can't go through a cloud memory API on someone else's infra.

You want memory that costs nothing beyond your existing Postgres bill. Apache 2.0, no usage tiers, no per-request pricing.

You need multi-tenant isolation at the database layer, not bolted on in your app code.

Why this matters

Built for the three things teams actually need.

// audit-grade memory

Pass any compliance audit

Every memory write logged with role + project + commit + timestamp at the database layer. Append-only, tamper-evident provenance — every write stamped with role, project, commit and timestamp, queryable directly in SQL.

SELECT role, commit_sha, t_valid_from FROM pgmnemo.agent_lesson

run	role	commit	at	trust
8423	dev	abc1234…	14:22:01	✓
8424	dev	def5678…	14:22:05	✓
8425	qa	NULL	14:22:08	⚠ blocked

// token economy

Memory pays for itself

1M context windows cost $5+ per call. pgmnemo retrieves the right 10K tokens instead of stuffing 1M. Memory ROI is positive after the first day at typical agent workloads.

navigate_locate() token-budget retrieval

fetch only what fits the budget·new in v0.8.0

// postgres flywheel

Every Postgres feature, free

JSONB → flexible metadata. pg_cron → automatic consolidation. Logical replication → multi-region HA. SaaS memory APIs have to build all this; pgmnemo inherits it.

psql · pgmnemo

postgres=# SELECT cron.schedule('pgm-evict', '0 3 * * *', 'SELECT pgmnemo.evict_expired_lessons()'); schedule ---------- 7 postgres=# EXPLAIN ANALYZE SELECT * FROM pgmnemo.recall_lessons('JWT'); QUERY PLAN ───────────────────────────────────────── Index Scan using ix_pgm_embed Planning Time: 0.087 ms Execution Time: 2.831 ms

Why pgmnemo

Four things nobody else does.

Provenance gate

Write-time artifact requirement enforced at the database layer. The only memory product with this — and the reason hallucinations stay out of canonical memory.

Zero new services

No daemons, no sidecars, no cloud accounts. CREATE EXTENSION pgmnemo CASCADE; — that's it. Memory lives where your data already does.

Hybrid recall in one SQL call

HNSW + BM25 + recency + importance — hybrid search and temporal ranking fused by rank. No glue code. No re-ranker microservice. One SELECT.

First-class role isolation

Multi-tenant from day one. role + project_id row-level security enforced inside Postgres — not by your app.

The differentiator

The only memory layer with a write-time provenance gate.

Hallucinated lessons cannot enter long-term memory. Every ingest() requires a verifiable artifact — git commit SHA, file hash, or test ID. Enforced at the database layer.

●VERIFIED LANE

●UNVERIFIED LANE

Write-time enforcement RLS-level checks, cannot be bypassed in app code.

Cryptographic verification commit_sha, artifact_hash, test_id.

Audit trail Every gate decision queryable forever.

// Why competitors can't bolt this on

Provenance moat

gate_strict=enforce fires BEFORE INSERT at the database trigger layer. SaaS memory providers (Mem0, Zep, Letta) accept writes via API — they cannot enforce write-time constraints because enforcement must happen at the storage layer, not the API layer. Application-side enforcement is bypassable; database-trigger enforcement is not.

How pgmnemo compares

Six tools. One table.

	pgmnemo	mem0	Zep	pgvector+glue	ParadeDB	LangMem
Deployment	Postgres extension	SaaS API	SaaS API	DIY pipeline	Postgres extension	Python lib
Cost per write	storage only	per-call API	per-call API	storage only	storage only	LLM call per write
Point-in-time queries	✓ as_of_ts	✗	✗	✗ DIY	✗	✗
Provenance gate at write	✓ enforced	✗	✗	✗	✗	✗
Data residency	your DB	Mem0 cloud	Zep cloud	your DB	your DB	in process
License	Apache-2.0	Apache-2.0 + SaaS	Apache-2.0 + SaaS	PostgreSQL	AGPL	MIT

Why ParadeDB isn't on this list as competition: ParadeDB is excellent at full-text search inside Postgres. pgmnemo is retrieval — it decides what an application should recall, with provenance, time validity, and ranking diagnostics. Different capability classes. pgmnemo sits next to ParadeDB in a Postgres install, answering different questions.

Enterprise wedge

For audit-sensitive AI systems.

pgmnemo turns retrieval into a traceable, replayable, evidence-backed database operation. Every recalled fact carries provenance metadata: commit SHA, artifact hash, verification timestamp. Every retrieval decision produces a per-signal scoring diagnostics breakdown. Point-in-time retrieval via as_of() answers the regulator's question: "What did the system know when it made that recommendation?"

Fintech compliance Healthcare AI Defense / SBOM Regulated industries

EU AI Act Articles 13–14 timelines are 2027+. pgmnemo is building the substrate now.

Governed retrieval inside Postgres.

Search finds documents. Retrieval decides what an application should recall — with what confidence, as of when, and why. pgmnemo is building the category of governed retrieval: retrieval that runs inside the database transaction, enforces write-time provenance, respects temporal validity, and can explain its ranking decisions. This is the EXPLAIN ANALYZE moment for AI retrieval — the point where retrieval stops being a black box and becomes a debuggable, auditable database operation.

Roadmap, not shipped

EXPLAIN RECALL — a dedicated function returning per-signal scoring diagnostics — is on the v0.6.x roadmap. The shipped precursor is the per-signal diagnostic columns vec_score, bm25_score, rrf_score already present in every recall_lessons() result row.

API

Two function calls. The rest is SQL you already know.

SQL

-- Store with provenance (required)
SELECT pgmnemo.ingest(
    p_role        := 'developer',
    p_project_id  := 1,
    p_topic       := 'auth',
    p_lesson_text := 'Rotate JWT after key compromise.',
    p_commit_sha  := 'abc1234'
);

-- Hybrid recall
SELECT lesson_text, score, vec_score, bm25_score
FROM pgmnemo.recall_lessons(
    query_embedding := embed('JWT rotation'),
    query_text      := 'JWT secret rotation',
    role_filter     := 'developer'
) ORDER BY score DESC LIMIT 10;

[ 1 / 5 ]
Provenance-gated writes
commit_sha or artifact_hash required at insert time. No artifact → no canonical row.
[ 2 / 5 ]
Hybrid scoring formula
sparse-safe RRF (Cormack 2009) — vector rank + BM25 rank fused by rank, not by score, with confidence as an auxiliary tie-breaker. Every signal is visible as rrf_score / vec_score / bm25_score.
[ 3 / 5 ]
Graph traversal
traverse_causal_chain() and traverse_temporal_window() for typed edges between lessons.
[ 4 / 5 ]
Scoring diagnostics
vec_score, bm25_score, rrf_score — per-signal scoring diagnostics showing which retrieval path fired.
[ 5 / 5 ]
LangChain integration
Drop-in retriever in integrations/langchain/. Works with any LangChain agent.

Research

Benchmarks we run against.

Evaluation benchmarks we run against: LongMemEval (arXiv:2410.10813), LoCoMo (arXiv:2402.17753).

acl 2024

LoCoMo: Evaluating Very Long-Term Conversational Memory of LLM-based Agents

Maharana et al. — our session-level benchmark

LoCoMo measures recall on multi-session conversations. pgmnemo v0.8.3 reaches recall@10 = 0.8409 (+4.15pp vs v0.3.0, p_corr=0.0156). Reproducible with the DRAGON embedder per paper protocol.

Read paper ↗

iclr 2025

LongMemEval: Benchmarking Long-Term Interactive Memory in Chat Assistants

Wu et al. — our retrieval-only benchmark

LongMemEval probes long-horizon retrieval on simulated user sessions. pgmnemo v0.8.3 reaches recall@10 = 0.9604 (sparse-safe RRF, bge-m3 1024d, p=0.017). A 50-line BM25 script reaches 0.9820 on this dataset. pgmnemo v0.8.3: 0.9604. Gap: −2.16 pp. We publish both.

Read paper ↗

We publish both numbers. On LongMemEval-S a 50-line BM25 script reaches 0.9820 (vs 0.9604 — gap 2.16 pp). The provenance gate is original work. See ROADMAP.md ↗.

Benchmarks

Real numbers. Published methodology.

LongMemEval-S recall@10 (higher = better)

pgmnemo v0.8.3

0.9604

BM25 baseline (50-line)

0.9820

pgvector cosine-only

0.5417

random recall

0.0100

A 50-line BM25 script beats us by 2.16 pp. We publish this. The provenance gate is original work — not in this benchmark.

LoCoMo

Maharana et al., ACL 2024

recall@100.8409+4.15pp

recall@50.7230+6.07pp

MRR0.6365+7.96pp

LongMemEval-S

Wu et al., ICLR 2025

recall@10 (pgmnemo v0.8.3)0.9604

MRR (pgmnemo)0.8472

recall@10 (BM25 baseline)0.9820baseline wins

Migration

Already on pgvector? Migration is two commands.

pgmnemo doesn't replace pgvector — it sits next to it, calling pgvector's HNSW for vector similarity. Your existing embeddings carry over. Add hybrid recall, temporal validity, and provenance enforcement without rebuilding your pipeline.

// step 1 · install (uses your existing pgvector)

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION pgmnemo CASCADE;

// step 2 · migrate existing embeddings

-- pgmnemo runs alongside pgvector; bring existing rows in via ingest(). no downtime.
SELECT pgmnemo.ingest(
  p_role        => 'system',
  p_project_id  => 1,
  p_topic       => 'import',
  p_lesson_text => content,
  p_embedding   => embedding
)
FROM your_embeddings;

Your pipeline keeps working. recall_lessons() becomes available alongside your existing pgvector queries. View full migration guide →

Install

Pick your path. All three take under five minutes.

// pgxn · recommended

One-line install

pgxn install pgmnemo

CREATE EXTENSION pgmnemo
  CASCADE;

// docker · production

Bake into your image

FROM pgvector/pgvector:pg17
ADD pgmnemo-0.8.3.zip /tmp/
RUN apt-get update && apt-get install -y unzip \
 && unzip /tmp/pgmnemo-0.8.3.zip -d /tmp/ \
 && cp -r /tmp/pgmnemo-0.8.3/extension/* \
       $(pg_config --sharedir)/extension/ \
 && rm -rf /tmp/pgmnemo-0.8.3* /var/lib/apt/lists/*

// from source · no docker

Build from source

git clone https://github.com/pgmnemo/pgmnemo.git
cd pgmnemo/extension && make && sudo make install

psql -c "CREATE EXTENSION pgmnemo CASCADE;"

// pypi · mcp · agent runtime

Use from any MCP-compatible agent

MCP server for Claude, Cursor, custom AI agent hosts. No app-level code changes.

View on PyPI ↗

pip install pgmnemo-mcp

PostgreSQL 17 blocking CI · PG 14 / 15 / 16 aspirational · pgvector ≥ 0.7.0 required

Full install guide ↗

FAQ

Questions from real installs.

The things people actually hit when wiring pgmnemo into a stack — embeddings, the provenance gate, upgrades. Answered straight, verified against the shipped extension.

Which embedding model should I use? Is bge-m3 enough?

Yes — with one hard requirement: pgmnemo stores exactly 1024-dimensional vectors. The schema declares embedding vector(1024) and ingest() rejects any other size.

bge-m3 (1024-dim, multilingual) is the validated default — it's what our benchmarks run on. Serve it from LM Studio's /v1/embeddings endpoint and use its dense output.

pgmnemo does not generate embeddings for you — you produce the vector and pass it to ingest() and recall_lessons(). Use the same model for writes and queries, or cosine similarity compares vectors from different spaces.

How do I turn off the commit-hash (provenance) check?

That's the provenance gate. By default it requires commit_sha or artifact_hash on insert, so every memory carries provenance. Control it with one setting:

SET pgmnemo.gate_strict = 'off';    -- disable
SET pgmnemo.gate_strict = 'warn';   -- allow, but log a warning

Default is 'enforce'. You can also set it per database with ALTER DATABASE … SET pgmnemo.gate_strict = 'off'. Cleaner than turning it off: just pass commit_sha or artifact_hash on insert.

Does pgmnemo replace pgvector?

No — pgmnemo runs alongside pgvector and uses it for the dense vector column. You need pgvector ≥ 0.7.0 installed; CREATE EXTENSION pgmnemo CASCADE pulls it in automatically.

How do I upgrade to a new version?

Install the new extension files, then run:

ALTER EXTENSION pgmnemo UPDATE;

This applies the migration chain to your existing data in place — no re-ingest needed.

My INSERT is rejected — what's wrong?

Two common causes. First, the provenance gate (above): a lesson with no commit_sha and no artifact_hash while gate_strict='enforce' — supply one of those fields, or relax the gate. Second, an embedding dimension mismatch — the vector must be exactly 1024-dim, or ingest() raises expected 1024, got N.

Don't see your question? Open an issue or read the full SQL reference.

Ask on GitHub ↗

Roadmap

What's coming next.

Direction, not commitments. We ship what we can verify with benchmarks first.

// next · v0.9.x

explain_recall()

Promote per-signal diagnostic columns (vec_score, bm25_score, rrf_score) into a first-class explainer function. Returns an EXPLAIN-style structured breakdown of why each result ranked where it did. Today's shipped precursor is the diagnostic columns already in every recall_lessons() row.

// next · v0.9.x

Per-tenant gate strict modes

gate_strict_per_tenant GUC for hybrid SaaS deployments where some tenants need enforce, others run in warn while their provenance corpus is bootstrapped.

// next · v0.9.x

Graph traversal scoring

Make mem_edge graph proximity a first-class signal in recall_lessons() alongside vector/BM25/temporal. The bitemporal graph layer is already in storage; graph proximity already ships as a multiplicative tie-breaker (v0.8.0); a first-class, equal-weight signal behind a full benchmark gate is next.

No dates. We benchmark before we promise.

Memory that learnsfrom what actually worked.

Your memory learns which lessons actually worked.

RRF ranking

ghost_count metric

dedup NOTICE

Memory for AI agents, without the noise.

Three SQL calls. That's the whole pipeline.

Agent finishes a task

pgmnemo verifies and stores

Future agents recall it

You probably already have the stack pgmnemo needs.

Built for the three things teams actually need.

Pass any compliance audit

Memory pays for itself

Every Postgres feature, free

Four things nobody else does.

Provenance gate

Zero new services

Hybrid recall in one SQL call

First-class role isolation

The only memory layer with a write-time provenance gate.

Provenance moat

Six tools. One table.

For audit-sensitive AI systems.

Governed retrieval inside Postgres.

Two function calls. The rest is SQL you already know.

Benchmarks we run against.

LoCoMo: Evaluating Very Long-Term Conversational Memory of LLM-based Agents

LongMemEval: Benchmarking Long-Term Interactive Memory in Chat Assistants

Real numbers. Published methodology.

LoCoMo

LongMemEval-S

Already on pgvector? Migration is two commands.

Pick your path. All three take under five minutes.

One-line install

Bake into your image

Build from source

Use from any MCP-compatible agent

Questions from real installs.

What's coming next.

explain_recall()

Per-tenant gate strict modes

Graph traversal scoring

Memory that learns
from what actually worked.