Does Context Swarm Memory beat Hindsight on BEAM 100K?
Yes, in the committed full local accepted-artifact comparison. CSM scores 0.757573 with 342/400 correct rows, versus Hindsight at 0.733658 with 326/400 correct rows.
Open-source memory infrastructure
Bounded read-only shards for cited long-term AI memory.
Memory whose edge grows as it scales.
CSM routes a query through immutable memory shards, probes for relevant evidence, recalls only from useful snapshots, and synthesizes a compact cited packet. Durable memory changes only through explicit Committer-gated writes.
CSM is an R&D memory system for long-running agents. It treats memory as bounded, inspectable shards rather than one ever-growing prompt. The read path is branch-and-discard; the write path is Committer-only.
Scope note: the scaling thesis is supported by the synthetic and Gemini scaling runs, where CSM stays stable as corpus size grows while RAG and long-context baselines degrade. The completed BEAM result is a full 100K Hindsight head-to-head, not yet a multi-scale BEAM study.
The public evidence bundle separates the north-star BEAM/Hindsight comparison from synthetic scaling, Gemini cross-model checks, and BABILong diagnostics.
Context Swarm Memory (CSM) beats the accepted local Hindsight BEAM 100K artifact in the committed full comparison: CSM scores 0.757573 with 342/400 correct rows, while Hindsight scores 0.733658 with 326/400 correct rows. CSM uses 38.2% fewer answer-visible context tokens, but retrieval is slower at 29.23s average versus Hindsight at 6.38s. This is a local accepted-artifact comparison, not yet an official leaderboard certification.
CSM spends context only after routing finds plausible shards. Shard snapshots are immutable, LLM providers stay behind a seam, and query-time reads do not mutate durable memory.
Concise answer, cited source IDs, conflict flags, and explicit uncertainty.
These answers mirror the structured data in the page head, keeping the public claim easy to quote while preserving the benchmark limits.
Yes, in the committed full local accepted-artifact comparison. CSM scores 0.757573 with 342/400 correct rows, versus Hindsight at 0.733658 with 326/400 correct rows.
CSM is an open-source LLM memory system using bounded read-only memory shards, manager routing, probe/recall/synthesis, cited answers, and explicit Committer-gated writes.
No. It is a committed full local accepted-artifact comparison against the accepted Hindsight artifact. The repo does not call it official SOTA until independent replication or official chart acceptance exists.
CSM answers more rows correctly and uses fewer AMB-visible answer-context tokens, but retrieval is slower: 29.23s on average versus 6.38s for Hindsight, with additional internal probe, recall, and synthesis tokens.
No. CSM retrieval does not use gold answers, rubrics, query IDs, or hardcoded benchmark answers. Querying memory reads immutable shard snapshots and does not mutate durable memory.
Bounded shards keep individual recall contexts small and route only plausible memory regions before synthesis, reducing whole-corpus context saturation. BEAM is the 100K head-to-head; separate synthetic and Gemini scaling runs support the broader scaling thesis.
The verifier hashes committed evidence rows and recomputes headline metrics, citation F1, McNemar checks, and the BEAM CSM-vs-Hindsight summary.
npm install
npm test
npm run build
npm run verify:published
npm run amb:patch -- --amb-dir /path/to/agent-memory-benchmark