Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures

Graph-Nativ e Cognitiv e Memory for AI Agen ts: F ormal Belief Revision Seman tics for V ersioned Memory Arc hitectures Y oung Bin P ark ∗ Mar. 17, 2026 CC BY-NC-ND 4.0 © 2026 Kumiho Inc. Abstract While individual comp onen ts for AI agent memory—v ersioning, retriev al, consolidation— exist in prior systems such as Graphiti, Mem0, and Letta, their architectural syn thesis and formal grounding remain underexplored. This pap er presen ts Kumiho , a graph-nativ e cognitive memory arc hitecture grounded in formal b elief revision seman tics. AI agen ts increasingly pro duce substan tial outputs—co de, designs, do cumen ts, in termediate results—that accum ulate without systematic v ersioning, pro venance trac king, or link age to the decisions that created them. In m ulti-agent w orkﬂo ws—where one agen t’s output b ecomes the next agent’s input—this lac k of structure is a fundamen tal b ottleneck. The core insigh t is that the structural primitiv es required for cognitiv e memory—imm utable revisions, m utable tag p oin ters, t yp ed dependency edges, URI-based addressing—are identic al to those required for managing these agent-produced outputs as v er- sionable, addressable, dep endency-linked assets. Rather than building a memory lay er and a separate asset track er, w e built a cognitiv e mem- ory arc hitecture whose graph-native primitives inheren tly serve as the operational infrastructure for multi-agen t w ork: agents use the same graph to remem b er, to ﬁnd each other’s outputs, and to build up on them. ∗ The structural corresp ondence betw een cognitive mem- ory and asset managemen t was recognized through the au- thor’s exp erience building pip eline infrastructure for ma jor visual eﬀects pro ductions. Con tact: support@kumiho.io The cen tral formal contribution is a corresp on- dence b et ween the A GM belief revision frame- w ork Alc hourrón et al. [ 1985 ] and the op erational seman tics of a prop erty graph memory system. W e frame this corresp ondence at the lev el of b elief b ases Hansson [ 1999 ], pro ving satisfaction of the ba- sic AGM p ostulates ( K ∗ 2 – K ∗ 6 ) and Hansson’s be- lief base p ostulates (Relev ance, Core-Retainment), while providing a principled rejection of the Re- co very postulate grounded in immutable v ersion- ing. The formal results hold for a delib erately sim- ple propositional logic o ver ground triples—a trade- oﬀ of expressiveness for tractabilit y that a voids the Flouris et al. imp ossibilit y results for description logics Flouris et al. [ 2005 ]. W e iden tify the supple- men tary p ostulates ( K ∗ 7 , K ∗ 8 ) as op en questions requiring further formal dev elopment. The architecture implements a dual-store model (Redis w orking memory , Neo4j long-term graph)— a recognized pattern in the agent memory space Hu et al. [ 2025 ]—with hybrid retriev al across full- text and v ector modalities. An async hronous con- solidation pipeline extends prior sleep-time com- pute approaches [ 2025 ] with safety guards adapted from distributed systems and con tent managemen t: published-item protection, circuit break ers, dry-run v alidation, and auditable cursor-based resumption. Critically , the same graph that stores an agen t’s memories also manages its work pro ducts: down- stream agents lo cate inputs via URI resolution, trac k which revision is curren t via tag p ointers, and link their own outputs bac k via typed edges—while h uman op erators audit the en tire c hain with the 1 same SDK and insp ection to ols. On the LoCoMo b enc hmark [ 2024 ] (tok en-level F1 with P orter stem- ming), Kumiho achiev es 0.447 four-category F1 ( n =1 , 540 )—the highest rep orted score across the retriev al categories—com bined with 97.5% adv er- sarial refusal accuracy ( n =446 ), a natural con- sequence of b elief revision semantics: the mem- ory graph con tains no fabricated information, so there is nothing for the mo del to hallucinate from. Including adv ersarial (binary scoring), ov erall F1 is 0.565 ( n =1 , 986 ). Automated AGM compli- ance testing (49 scenarios, 7 p ostulates) conﬁrms 100% op erational adherence to the claimed for- mal properties. On LoCoMo-Plus [ 2026 ]—a Lev el-2 cognitiv e memory b enchmark testing implicit con- strain t recall under inten tional cue-trigger seman- tic disconnect—Kumiho ac hieves 93.3% judge ac- curacy ( n =401 , all four constraint types); inde- p enden t repro duction b y the b enchmark authors yielded results in the mid-80% range, still substan- tially outp erforming all published baselines (b est: Gemini 2.5 Pro, 45.7%). Recall accuracy reaches 98.5% (395/401), with the remaining 6.7% end-to- end gap en tirely attributable to answ er mo del fab- rication on correctly retriev ed con text. Three ar- c hitectural inno v ations driv e b oth results: pr osp e c- tive indexing , where LLM-generated h yp othetical future scenarios are indexed alongside eac h mem- ory summary , bridging the cue-trigger seman tic gap at write time; event extr action , where struc- tured even ts with consequences are app ended to summaries to preserv e causal detail that narrativ e compression w ould otherwise drop; and client-side LLM r er anking , where the consuming agen t’s own LLM selects the most relev an t sibling revision from structured metadata at zero additional inference cost. The architecture is mo del-decoupled: switc h- ing the answer model from GPT-4o-mini ( ∼ 88%) to GPT-4o (93.3%) improv es end-to-end accuracy b y 5.3 p oints without any pipeline c hanges, demon- strating that recall accuracy is a prop ert y of the arc hitecture, not the mo del. The system uses GPT- 4o-mini for the bulk of LLM op erations at a total cost of ∼ $14 for 401 en tries. Keyw ords: cognitive memory , AI agen ts, b elief re- vision, A GM p ostulates, kno wledge graphs, mem- ory architecture, prop erty graph, Neo4j, retriev al- augmen ted generation, Mo del Context Proto col 1 In tro duction The role of large language models has shifted fundamen tally . LLMs are no longer stateless question-answ ering systems or conv ersational chat- b ots. They op erate as AI agents: autonomous w orkers that execute m ulti-step tasks, produce digi- tal artifacts, make consequential decisions, and col- lab orate with h umans and other agen ts across ex- tended w orkﬂows W ang et al. [ 2024 ], Zhang et al. [ 2024 ]. A coding agen t writes and commits co de. A design agen t generates and iterates on visual as- sets. A research agen t gathers, synthesizes, and rep orts ﬁndings. A pro duction agen t co ordinates tasks across departmen ts and tracks deliverables. This shift from chatbot to work er creates t w o in- tert wined requiremen ts that curren t architectures address separately , if at all. First, agents need cog- nitiv e memory : the abilit y to remem b er past in- teractions, track how b eliefs evolv ed, recall why de- cisions were made, and consolidate exp erience in to reusable knowledge. Second, agen ts need w ork pro duct managemen t : the abilit y to version, lo- cate, and build up on the artifacts they pro duce—so that a downstream agen t in a multi-step pip eline can ﬁnd the right input revision, understand its pro venance, and link its own output bac k to the c hain. These tw o requirements share deep structural parallels. Remem b ering that “the client prefers w arm color palettes” (cognitive memory) and track- ing that “environmen t-ligh ting revision 5 is the ap- pro ved v ersion” (w ork pro duct managemen t) b oth require the same core primitives: immutable ver- sioned snapshots, typed dep endency edges, muta- ble status pointers, and conten t-reference separa- tion. The domains are not identical—cognitiv e memory in volv es belief revision semantics, uncer- tain ty , and natural language, while asset manage- men t operates on well-t yp ed artifacts with deter- ministic workﬂo ws—but the underlying data mo del is shared. The present work builds a single graph- nativ e cognitive memory arc hitecture whose prim- itiv es simultaneously serve as the operational in- frastructure for managing agen t w ork products— enabling multi-agen t pip elines where eac h agen t’s output is automatically versioned, addressable, and link ed to the reasoning that pro duced it (Section 7 2 addresses the b elief revision semantics sp eciﬁc to cognitiv e memory; Section 6 addresses the asset managemen t extensions). This dual-purpose design addresses a growing en- terprise need. As AI agents transition from exp er- imen tal tools to pro duction work ers, organizations require the same go vernance, traceabilit y , and ac- coun tability standards they apply to h uman em- plo yees. Industry frameworks are emerging to meet this demand: ISACA’s 2025 analysis of agentic AI audit challenges iden tiﬁes identit y management, de- cision traceability , and accoun tabilit y gaps as criti- cal concerns; IBM has introduced Agent Decision Records for structured agent accoun tability; and en terprise platforms increasingly frame agen ts as w orkforce members requiring go v ernance infrastruc- ture. Our arc hitecture provides this go vernan ce nativ ely: every agen t belief has a URI, a revision history , prov enance edges to source evidence, and an immutable audit trail—the same accountabilit y infrastructure that organizations already apply to v ersion-controlled co de and managed digital assets. The most common alternativ e response to the memory c hallenge has been to expand the con- text window itself: from 4K tok ens in early mo d- els to 128K, 200K, and b eyond. While larger win- do ws accommo date more immediate con text, they do not constitute memory . A con text windo w is consumed fresh on every in vocation; it provides no mec hanism for versioning b eliefs, tracking eviden- tial prov enance, expressing dep endency relation- ships b etw een conclusions, or consolidating many episo des into generalized kno wledge. The distinc- tion is analogous to the diﬀerence b etw een a white- b oard and a ﬁling system. A larger whiteboard lets y ou write more at once, but it is erased b et ween sessions. W e con tribute not nov el individual components— man y of whic h exist in concurrent systems suc h as Graphiti [ 2025 ], Mem0 [ 2025 ], and A-MEM [ 2025 ]— but a no vel architectural syn thesis grounded in for- mal analysis, implemented as the Kumiho sys- tem. 1 The sp eciﬁc con tributions are: 1. Uniﬁed cognitiv e memory and asset man- 1 The core graph server is provided as a cloud service at https://kumiho.io . The Python SDK, MCP mem- ory plugin, and b enchmark suite are op en-source: https: //github.com/KumihoIO . agemen t arc hitecture (Section 4 ): A graph- nativ e cognitiv e memory system whose struc- tural primitiv es—immutable revisions, t yp ed edges, mutable tag p ointers, URI addressing— sim ultaneously serv e as the operational infras- tructure for managing agent work pro ducts. Agen ts use the same graph to remem b er past in teractions and to version, lo cate, and build up on eac h other’s outputs, enabling fully au- tonomous m ulti-agent pip elines without separate asset tracking systems. This pap er v alidates the cognitiv e memory capabilities empirically (Sec- tions 15.2 – 15.3 ); the asset managemen t uniﬁca- tion is an architectural contribution whose multi- agen t pip eline v alidation is planned as future w ork. 2. F ormal b elief base revision corresp on- dence (Section 7 ): A structural corresp ondence b et ween the AGM b elief revision p ostulates and graph-nativ e memory op erations, framed at the b elief base level Hansson [ 1999 ], with a princi- pled rejection of the Recov ery p ostulate and for- mal av oidance of the Flouris et al. imp ossibilit y results. 3. URI-based univ ersal addressing : A memory reference sc heme enabling deterministic resolu- tion and prov enance tra versal. Our URI sc heme pro vides hierarc hical scoping, revision pinning, and artifact addressing. Among agent memory systems, w e found no prior use of structured, hierarc hical URIs with these prop erties. This pro vides the mec hanism for formal constructs: ev ery b elief has a dereferenceable address, every revision is citable, and every pro venance c hain is traceable. 4. Safet y-hardened consolidation (Section 9 ): An async hronous consolidation pip eline with no vel safety mechanisms: published-item pro- tection, circuit breakers, dry-run v alidation, and auditable cursor-based resumption. 5. SDK transparency and m ulti-agen t in ter- op erabilit y (Section 5.2 ): The same graph and SDK that agents use to manage w ork products also exp oses their memories for human inspec- tion through a w eb dashboard and desktop as- set bro wser. Agen ts query the graph to ﬁnd in- puts; op erators query it to audit decisions—b oth through the same API, addressing the growing 3 need for AI memory observ ability alongside op- erational m ulti-agent co ordination. 6. LoCoMo ev aluation on the oﬃcial metric (Section 15.2 ): 0.447 four-category tok en-level F1 on the LoCoMo benchmark ( n =1 , 540 )—the highest rep orted score across the retriev al cat- egories LoCoMo [ 2024 ]—com bined with 97.5% adv ersarial refusal accuracy ( n =446 ). The near- p erfect adv ersarial score is a natural consequence of the b elief revision architecture: the memory graph genuinely do es not con tain fabricated in- formation, so there is nothing for the mo del to hallucinate from. Including adv ersarial (binary scoring), ov erall F1 is 0.565 ( n =1 , 986 ). The BYO-storage architecture keeps raw conv ersa- tion data entirely on the user’s lo cal storage while ac hieving state-of-the-art retriev al accu- racy through structured summarization with en- ric hment. 7. Empirical A GM compliance veriﬁcation (Section 15.7 ): An automated test suite of 49 scenarios across 5 categories verifying oper- ational adherence to all 7 claimed p ostulates ( K ∗ 2 – K ∗ 6 , Relev ance, Core-Retainment), in- cluding adv ersarial edge cases (rapid sequential revisions, deep dependency chains, mixed edge t yp es). 100% pass rate conﬁrms that the imple- men tation faithfully executes the formal sp eciﬁ- cation. 8. LoCoMo-Plus ev aluation (Section 15.3 ): 93.3% judge accuracy on LoCoMo-Plus ( n =401 )—a Level-2 cognitive memory b ench- mark testing implicit constrain t recall under in tentional cue-trigger semantic disconnect— outp erforming the b est published baseline (Gemini 2.5 Pro, 45.7%) by 47.6 percent- age p oin ts. Recall accuracy reac hes 98.5% (395/401), with 78% of failures attributable to answer mo del fabrication, not retriev al. T wo consolidation enrichmen ts— pr osp e ctive indexing and event extr action —eliminated the > 6-mon th accuracy cliﬀ (37.5% → 84.4%). The arc hitecture is mo del-decoupled: sw apping the answ er mo del from GPT-4o-mini ( ∼ 88%) to GPT-4o (93.3%) improv es accuracy without pip eline c hanges, at a total cost of ∼ $14. 9. Cross-b enc hmark generalization and clien t-side reranking (Sec- tions 15.2 , 15.3 , 8.8 ): The same architecture ac hieves state-of-the-art on both LoCoMo and LoCoMo-Plus (contributions 6 and 8 ab o ve), v alidating that graph-nativ e primitiv es general- ize across ev aluation proto cols. Clien t-side LLM reranking—where the consuming agen t’s own LLM selects the most relev ant sibling revision from structured metadata at zero additional cost—complemen ts prosp ective indexing and ev ent extraction as the third arc hitectural mec hanism driving b oth results. The remainder of the paper co v ers related w ork, design principles, the formal A GM corresp ondence, arc hitecture details (hybrid retriev al, consolida- tion pip eline, safety guards), empirical ev alua- tions on LoCoMo (Section 15.2 ) and LoCoMo-Plus (Section 15.3 ), A GM compliance v eriﬁcation (Sec- tion 15.7 ), and future directions. 2 Related W ork 2.1 Agen t Memory Arc hitectures The landscape of agen t memory systems has de- v elop ed rapidly since 2023, with the ICLR 2026 MemAgen ts W orkshop [ 2026 ] marking the ﬁeld’s maturation. W e p osition our work relativ e to the most relev an t concurren t systems. Hu et al. Hu et al. [ 2025 ] pro vide the current canonical survey . Graphiti/Zep [ 2025 ] shares the most surface- lev el comp onents with our system: it implemen ts a temp oral knowledge graph on Neo4j with entit y- ev ent syn thesis, bitemporal versioning, and triple- mo dalit y hybrid retriev al com bining BM25, cosine similarit y , and graph trav ersal. Graphiti rep orts 94.8% accuracy on the Deep Memory Retriev al (DMR) b enc hmark and 18.5% improv ement on LongMemEv al. Our LoCoMo results (Section 15.2 ) are discussed in the ev aluation sections. The key arc hitectural diﬀerences are threefold: (i) w e pro- vide a formal b elief revision correspondence that Graphiti lacks; (ii) our URI addressing scheme en- ables deterministic cross-system memory references; (iii) our BYO-storage design keeps ra w conv ersation data on the user’s lo cal storage, whereas Graphiti pro cesses and stores full con tent server-side. Mem0/Mem0g [ 2025 ] implemen ts a triple-store arc hitecture with timestamp ed v ersioned memo- 4 ries and LLM-p ow ered conﬂict resolution, report- ing 26% improv emen t o ver Op enAI Memory on LoCoMo. A-MEM Xu et al. [ 2025 ] introduces Zettelk asten-inspired dynamic linking (NeurIPS 2025). MAGMA [ 2026 ] proposes a multi-graph arc hitecture with four orthogonal graph la yers (se- man tic, temp oral, causal, entit y), achieving the highest LoCoMo judge score of 0.70 with p olicy- guided retriev al tra versal. MA GMA’s design rep- resen ts an alternativ e structural philosophy: it dis- en tangles memory dimensions in to separate graphs for cleaner retriev al routing, whereas our archi- tecture uniﬁes all relationships in a single prop- ert y graph with typed edges, enabling cross- dimensional trav ersal (e.g., Anal yzeImp a ct prop- agating across all edge types sim ultaneously). Nei- ther approach has b een empirically compared to the other. MemGPT/Letta [ 2023 ] pioneered virtual con text extension. Letta’s sleep-time com- pute [ 2025 ] in tro duced asynchronous bac kground consolidation—the same biological metaphor we use—with an ticipatory pre-computation (pre- dicting future queries), whic h our Dream State do es not currently address (Section 16 ). Our consolidation contribution is the safet y guard arc hitecture (Section 9.5 ). Letta Con text Rep ositories [ 2026 ] (F ebruary 2026) provide git-bac ked memory ﬁlesystems with automatic v ersioning and merge-based conﬂict resolution via m ulti-agent worktrees. While this is the most structurally related pro duction system to our v ersioned memory concept, the approac h diﬀers fundamen tally in three resp ects: (i) V ersioning substr ate. Letta uses Git—a source co de v ersioning to ol—pragmatically as a storage back end for Markdo wn ﬁles. Our graph- nativ e approac h yields richer versioning primitives than Git’s ﬁle-and-commit model: typed cognitiv e edges (not just ﬁle diﬀs), multi-tag pointer lay- ers (not just branc h heads), artifact attac hments (not just ﬁle con tent), and hierarc hical pro ject/s- pace scoping (not just directory trees). These same primitiv es double as auditable asset managemen t for agen t work pro ducts (Section 4.2 ). (ii) Conﬂict r esolution. Letta resolves concur- ren t writes via Git’s text-lev el merge, whic h can detect but not seman tically resolve con tradictory b eliefs—the merge requires h uman or LLM inter- v ention on the text diﬀ. Our system resolves con- ﬂicts via A GM-compliant b elief revision op erators: the Supers edes edge creates a new revision with formal guaran tees (Success, Consistency , minimal c hange via Relev ance). (iii) Downstr e am pr op agation. In Letta, up dat- ing a b elief in one ﬁle do es not automatically iden- tify other ﬁles that dep end on it. In our system, Anal yzeImp act tra verses t yp ed Depends_On edges to identify all downstream conclusions that ma y need re-ev aluation—a capability enabled by the typed edge ontology that ﬂat ﬁle systems can- not express. A concrete example illustrates these diﬀerences. If an agen t’s memory contains “client prefers warm tones” and a concurrent write pro duces “clien t now prefers co ol tones,” Letta’s system generates a text- lev el merge conﬂict requiring h uman or LLM in- terv ention. Our system creates a new revision with a Supersedes edge, mov es the tag pointer, and mak es the stale belief retriev able only via ex- plicit opt-in—with Anal yzeImp a ct propagating the c hange to downstream decisions automatically . A dditional concurrent systems include Ev er- MemOS [ 2026 ] (self-rep orted 93.05% LoCoMo; see b enc hmark ca veat b elo w), CAM [ 2025 ] (NeurIPS 2025), and MemoryAgentBenc h [ 2026 ] (ICLR 2026). MemOS [ 2025 ] (EMNLP 2025 Oral) imple- men ts a hierarc hical memory manager with global, lo cal, and working memory buﬀers inspired by op- erating system design. Hindsigh t [ 2025 ] prop oses a four-net w ork memory arc hitecture (facts, exp e- riences, opinions, observ ations) achieving 89.61% on LoCoMo and 91.4% on LongMemEv al with op en-source ev aluation. Its Opinion Net w ork with conﬁdence-scored b eliefs that up date with evidence represen ts a pragmatic form of b elief v ersioning without formal A GM grounding. The empirical success of Hindsigh t’s approach demonstrates that practical b elief tracking delivers strong results; our w ork provides the formal framew ork that could guaran tee consistency prop erties—sp eciﬁcally , that b elief revision satisﬁes minimal change (Relev ance) and do es not discard b eliefs without justiﬁca- tion (Core-Retainment)—for systems lik e Hind- sigh t. AgeMem [ 2026 ] uniﬁes L TM/STM manage- men t via RL with to ol-based op erations and three- 5 stage progressive GRPO training. E-mem [ 2026 ] addresses m ulti-agen t episo dic con text reconstruc- tion, achieving 54%+ F1 on LoCoMo while reduc- ing token cost by 70%. LatentMem [ 2026 ] in- tro duces learnable multi-agen t memory with La- ten t Memory P olicy Optimization. CAST [ 2026 ] prop oses character-and-scene episo dic memory for agen ts. CoALA [ 2024 ] provides a cognitiv e science taxonom y informing our dual-store design. Recen t surv eys pro vide complementary tax- onomies. Hu et al. Hu et al. [ 2025 ] provide the cur- ren t canonical surv ey with a comprehensive taxon- om y of memory functions, substrates, and dynam- ics. Y ang et al. Y ang et al. [ 2026 ] survey graph- based agent memory from DEEP-PolyU, organiz- ing systems by graph types and memory functions. Huang et al. Huang et al. [ 2026 ] provide a large- scale surv ey co vering memory substrates, cognitiv e mec hanisms, and memory sub jects. Luo et al. Luo et al. [ 2026 ] trace the evolutionary tra jectory from static storage to exp erience-driven memory mec ha- nisms. T able 1 summarizes the k ey architectural distinc- tions among the most relev ant concurrent systems; a comprehensive feature comparison appears in Sec- tion 12 . Benc hmark standardization ca veat. No standardized LoCoMo leaderb oard exists—all re- p orted n umbers use v arying ev aluation conﬁgu- rations (diﬀeren t judge mo dels, question subsets, and ev aluation prompts). Ev erMemOS’s 93.05% is ev aluated using their own framework with no indep enden t repro duction. Li et al. [ 2026 ] intro- duced LoCoMo-Plus (F ebruary 2026), demonstrat- ing that existing LoCoMo scores largely measure ex- plicit factual recall under strong seman tic alignment rather than genuine cognitive memory under cue- trigger semantic disconnect; all tested systems sho w substan tial p erformance drops from LoCoMo to LoCoMo-Plus. T o address the comparability gap, w e no w pro vide tok en-level F1 results (Section 15.2 ) on the oﬃcial LoCoMo metric [ 2024 ], enabling di- rect comparison with Zep [ 2025 ], Mem0, Memo- base [ 2025 ], and ENGRAM [ 2025 ] on the same scor- ing function. The DMR b enchmark, while histori- cally imp ortant, uses con v ersations ( ∼ 60 messages) that ﬁt within mo dern context windo ws and re- lies exclusively on single-turn fact-retriev al ques- tions Graphiti [ 2025 ]. All comparative num b ers cited in this pap er should b e understo o d as ap- pro ximate and non-standardized. Our LoCoMo- Plus ev aluation (Section 15.3 ) further v alidates the strategic signiﬁcance of graph-nativ e architec- tures for Lev el-2 cognitive memory , where all tested baselines—including systems using premium mo d- els with million-tok en context windows—score be- t ween 23% and 46%. 2.2 Belief Revision in AI The A GM framework Alchourrón et al. [ 1985 ], Gär- denfors [ 1988 ] deﬁnes rationalit y postulates for b e- lief c hange. Hansson Hansson [ 1999 ] extended this to belief bases—ﬁnite sets not closed under logi- cal consequence—which is the appropriate level for computational systems. Our formal analysis sits within this tradition. Imp ossibilit y in Description Logics. Flouris et al. Flouris et al. [ 2005 ] pro ved that Description Logics (including those underlying O WL) cannot satisfy AGM revision p ostulates. Qi et al. Qi et al. [ 2006 ] reﬁned this for sp eciﬁc fragments. These results are critical context: our prop erty graph a voids these imp ossibilities by op erating on a sim- pler structure (Section 7.6 ). A GM and Mac hine Learning. Arav anis Ara- v anis [ 2025 ] establishes a corresp ondence b etw een mac hine learning and A GM-style b elief change; Hase et al. Hase et al. [ 2024 ] frame LLM model editing as b elief revision. Baitalik et al. Baita- lik et al. [ 2026 ] (AAAI 2026 Bridge Program) ap- ply GreedySA T-based consistency to multi-turn di- alogues. Wilie et al. Wilie et al. [ 2024 ] demonstrate LLMs’ po or b elief revision capabilities on Belief- R, motiv ating external memory architectures that enforce revision constraints structurally . Our w ork complemen ts this stream by applying AGM to the external memory graph. F ormal Developmen ts. Bonanno Bonanno [ 2025 ] uniﬁes Ba yesian and A GM approac hes via Kripk e–Lewis seman tics, connecting belief revision to dynamic epistemic logic. Meng et al. Meng et al. [ 2025 ] develop b elief algebras for iterated revision. Chandler and Bo oth Chandler and Bo oth [ 2025 ] address parallel b elief revision via order aggrega- tion (IJCAI 2025), directly relev an t to the question 6 T able 1: Architectural comparison with selected concurrent systems. Dim. Graphiti Mem0 Letta Hindsigh t Ours V ersion Bitemp oral Timestamp ed Git None Imm utable+tags Conﬂict T emp oral LLM Git merge Conf AGM Sup ersedes Retriev al BM25+vec Graph File 4-net Hybrid F ormal None None None None A GM p ostulates URI × × Git × kref:// Consol T emp oral None Git GC None Dream State of comp ositional b elief change in batc h op erations. Sc hwind et al. Sc hwind et al. [ 2025 ] connect iter- ated b elief change to learning, bridging the formal and computational p ersp ectiv es. The Reco very P ostulate. Makinson Makin- son [ 1987 ] questioned its status; Hansson Hansson [ 1991 ] prop osed con traction without Recov ery . Our principled rejection (Section 7.3 ) pro vides a con- crete op erational demonstration. 2.3 V ersioned Kno wledge Graphs The concept of v ersion-controlled kno wledge graphs has a substantial history in the Semantic W eb com- m unity . R&Wbase V ander Sande et al. [ 2013 ] (subtitled “Git for triples”) supp orted branching and merging for quad-stores. Sem V ersion Völk el et al. [ 2005 ] applied v ersion control to RDF graphs with structural diﬀ and merge. Quit Store Arndt et al. [ 2018 ] (“Quads in Git”) provides a SP AR QL 1.1 endp oin t back ed b y Git-v ersioned RDF named graphs with branc hing, merging, and pro venance. OSTRICH T aelman et al. [ 2018 ] im- plemen ts hybrid v ersioned triple storage with im- m utable snapshots and delta chains. Con V er- G [ 2024 ] enables concurren t v ersioned querying of kno wledge graphs using bitstring-based condensed represen tations. Our contribution is not the in ven tion of v ersioned graphs—this is w ell-established—but the applica- tion of versioned graph primitiv es to c o gnitive mem- ory sp e ciﬁc al ly , com bined with formal b elief revi- sion analysis. The v ersioned K G systems ab ov e target SP ARQL-based kno wledge management; we target the distinct requiremen ts of AI agent mem- ory: t yp ed dependency edges encoding epistemic relationships (not just on tological ones), mutable tag p ointers for b elief status tracking, asynchronous LLM-driv en consolidation, and the formal corre- sp ondence to AGM p ostulates that Section 7 es- tablishes. Similarly , the broader analogy of version control applied to agen t memory is increasingly explored. Git-Con text-Controller GCC [ 2025 ] (GCC) explic- itly applies Git seman tics (COMMIT, BRANCH, MER GE) to LLM agent memory and ac hieves strong results on SWE-Bench-Lite. Our system dif- fers in op erating on t yp ed knowledge graph triples rather than text ﬁles, enabling the formal prop erties describ ed in Section 7 and the t yp ed dep endency reasoning una v ailable to text-level version control. 2.4 Hybrid Retriev al and Score F usion Rob ertson and Zaragoza Rob ertson and Zaragoza [ 2009 ] formalized BM25. Cormack et al. Cormac k et al. [ 2009 ] in tro duced Recipro cal Rank F usion (RRF). Bruc h et al. Bruc h et al. [ 2023 ] analyzed fu- sion functions systematically , sho wing that con vex com binations can outp erform RRF on standard IR b enc hmarks. Our max-based fusion (CombMAX in the terminology of F o x and Sha w F o x and Sha w [ 1993 ]) is a delib erate design choice, not a no vel fu- sion metho d. In memory retriev al, a strong exact- matc h signal on one branch should not b e diluted by a weak score on another branc h. W e motiv ate this through a precision preserv ation observ ation (Sec- tion 8 ), but ackno wledge that Com bMAX is kno wn to b e susceptible to noise from p o orly-calibrated retriev ers pro ducing inﬂated scores Bruch et al. [ 2023 ]. This is an argumentativ e design choice, not a theoretical result; comparativ e ev aluation against RRF and con v ex combination is planned (Section 16 ). 7 2.5 AI Agen t Observ ability As AI agents increasingly p erform consequen tial w ork, the need for memory observ abilit y has b e- come acute. Op enT elemetry pro vides agen t-sp eciﬁc telemetry schemas; to ols lik e Langfuse, LangSmith, and Brain trust oﬀer instrumen tation for tracing in- ference steps and to ol calls. Our con tribution dif- fers: existing observ abilit y to ols trace inference; our system makes memory itself the auditable ar- tifact. Every agent belief has a URI, revision his- tory , pro venance edges, and imm utable audit trail. The web dashboard (Figure 1 ) and desktop asset bro wser render this graph as a browseable, search- able hierarch y with interactiv e visualization, en- abling human op erators to audit agen t reasoning at the memory lev el—not just at the tool-call level. 3 Wh y Context Window Exten- sion Is Not Memory W e iden tify four structural deﬁciencies of con text- windo w-as-memory . A ttention is not recall. A con text window pro- vides attention capacit y; memory requires selective recall from a large corpus of past exp erience. Quadratic cost scaling. T ransformer atten tion scales as Θ( n 2 · d ) V aswani et al. [ 2017 ]. A retriev al- based system that indexes N memories and re- triev es top- k incurs O (log N ) + O ( k 2 · d ) . Since k ≪ n b y design, the retriev al approac h is orders of magnitude cheaper. Moreov er, the metadata-o v er- con tent arc hitecture (Section 6 ) reduces the p er- item token size: eac h of the k recalled memories is a compact summary , not a full transcript. This comp ounds the cost adv antage b y reducing b oth the total tok en cost and the cognitive load on the rea- soning mo del. While techniques suc h as sparse attention, slid- ing windo ws, and linear atten tion approximations mitigate the quadratic cost, they do so by sacriﬁc- ing the v ery capabilit y that makes larger windo ws useful: the abilit y to attend to arbitrary p ositions in the sequence. An agent’s lifelong in teraction his- tory measured in tens of millions of tokens mak es in-con text approaches economically and computa- tionally infeasible for p ersisten t memory . No structural represen tation. A ﬂat to- k en sequence cannot express that belief B 2 sup er- sedes B 1 , that conclusion C dep ends on assump- tions A 1 , A 2 , or that a memory has b een v alidated, deprecated, or ﬂagged for review. Mo del coupling. Context windo w con tents are ephemeral and model-sp eciﬁc. Agen t memory m ust b e LLM-decoupled: stored in a p ersistent, model- indep enden t data structure that an y curren t or fu- ture language mo del can query . 4 Bac kground and Motiv ation 4.1 Human Memory as a Design T em- plate Cognitiv e science distinguishes b etw een work- ing memory—a capacit y-limited, rapidly-accessible buﬀer A tkinson and Shiﬀrin [ 1968 ], Baddeley [ 2000 ]—and long-term memory , further divided in to episo dic and semantic stores T ulving [ 1972 ]. T rans- fer from episo dic to seman tic memory inv olv es ac- tiv e consolidation during sleep Rasch and Born [ 2013 ]. Human decision-making pro ceeds through a lo op of p erceiv e–recall–revise–act that a memory- equipp ed AI agen t must replicate. 4.2 The Asset Management Insigh t AI agents increasingly pro duce substantial digi- tal outputs—generated images, co de commits, de- sign iterations, research documents, intermediate results—that accum ulate without systematic ver- sioning, pro venance trac king, or link age to the de- cisions that created them. As sessions accumu- late, these outputs b ecome lost, duplicated, or un- traceable. The structural corresp ondence b et ween the primitives needed for cognitiv e memory and those needed for trac king these agen t-pro duced out- puts is not coincidental: b oth domains require imm utable versioned snapshots, typed dep endency edges, m utable status pointers, URI-based address- ing, and conten t-reference separation. This corre- sp ondence w as ﬁrst recognized through pro duction exp erience—asset managemen t systems in visual eﬀects and game developmen t hav e implemen ted these exact primitives for decades—but the arc hi- tectural direction is memory-ﬁrst: the graph-native 8 T able 2: Structural correspondence b et ween asset managemen t and cognitive memory . Asset Memory Concept Pro ject Scop e Container Space T opic Namespace Item Unit Iden tity Revision Belief T Snapshot Artifact Evidence P oin ter T ag Status Mutable ref Edge Relation T yp ed link Bundle Cluster Grouping primitiv es required for cognitiv e memory inheren tly pro vide auditable asset management as an emergent capabilit y . T raditional asset management systems enco de these fundamen tally graph-lik e relationships in rela- tional databases (P ostgreSQL, MySQL) with RPC framew orks (Thrift, gRPC), where dep endency c hains require recursive queries or application-level graph w alks ov er foreign key joins. The relation- ships exist, but they are second-class citizens. The presen t architecture adopts a nativ e graph database (Neo4j) as the storage la yer, making rela- tionships ﬁrst-class citizens. A dep endency betw een t wo revisions that w ould require a junction table ro w and a m ulti-join query b ecomes a literal di- rected edge trav ersable in a single graph op eration. This graph-native foundation enables the cognitiv e memory primitiv es (belief revision c hains, eviden- tial pro venance, and the formal prop erties describ ed in Section 7 ) while simultaneously providing the asset management capabilities (nativ e dependency tra versal, real-time impact analysis, typed relation- ship on tology) needed to track agen t work products. T able 2 formalizes this structural corresp ondence. Principle 1 (Cognitiv e Memory as Multi-Agen t In- frastructure) . AI agents incr e asingly pr o duc e sub- stantial outputs—gener ate d images, c o de artifacts, design iter ations, do cuments, interme diate r esults— that ar e not systematic al ly tr acke d, versione d, or linke d to the de cisions that cr e ate d them. In multi- agent workﬂows, this b e c omes a critic al b ottlene ck: a vide o c omp ositing agent c annot lo c ate the ap- pr ove d textur e r evision pr o duc e d by an upstr e am gener ation agent; an e diting agent c annot tr ac e which audio mix c orr esp onds to which sc ene cut. The c or e insight is that the structur al primitives ne e de d for c o gnitive memory (immutable r evisions, typ e d e dges, mutable tag p ointers, URI addr essing) ar e iden tical to those ne e de d for managing these agent-pr o duc e d outputs as versionable, addr essable, dep endency-linke d assets. R ather than building a memory layer and a sep ar ate asset tr acker, we built a c o gnitive memory ar chite ctur e whose gr aph-native primitives inher ently serve as the op er ational infr as- tructur e for multi-agent work—enabling agents to r ememb er, ﬁnd e ach other’s outputs, and build up on them thr ough a single system. An extensive liter a- tur e se ar ch found no prior work unifying these two c ap abilities on a shar e d gr aph substr ate. 4.3 Dual-Purp ose Graph: Memory and Asset T rac king as One System The structural corresp ondence in T able 2 enables a concrete op erational capabilit y: the same graph that stores an agen t’s cognitive memories also man- ages the outputs that agents pro duce. Consider a m ulti-agent creativ e pip eline: an image genera- tion agent pro duces a concept art revision; a video comp ositing agen t lo cates that revision via its URI, c hecks that it carries the “approv ed” tag, and cre- ates its own output with a Derived_Fr om edge linking bac k to the input; an audio agent do es the same for the soundtrack; and an editing agent as- sem bles the ﬁnal deliv erable with t yp ed edges to ev ery upstream comp onen t. Each agen t uses the graph b oth to r ememb er (clien t preferences, past iterations, feedback history) and to op er ate (ﬁnd the righ t input version, register its output, declare dep endencies). This dual-purpose design means agents in a pip eline do not need separate systems for “remem- b ering what happened” and “managing the outputs they created.” Both are ﬁrst-class graph citizens. Without this uniﬁcation, agent-produced artifacts accum ulate in disconnected storage—ﬁles on disk, blobs in buc kets, ephemeral con text windo ws—with no versioning, no pro venance, and no link to the reasoning that generated them. The graph ensures that ev ery output is addressable, versionable, and traceable to the b elief state that pro duced it—and that downstream agen ts can ﬁnd and build up on it programmatically . 9 5 LLM-Memory Decoupling A persistent failure mo de in current agent mem- ory systems is tight coupling b etw een the memory store and the language model. When memory ex- ists as prompt context, mo del-managed buﬀers, or framew ork-sp eciﬁc data structures, it b ecomes in- separable from the sp eciﬁc mo del v ersion. This coupling creates provider lo c k-in, mo del upgrade fragilit y , and architectural non-p ortability . 5.1 The Decoupled Architecture The architecture enforces LLM-memory decoupling through three mec hanisms: Mo del-indep enden t storage. The mem- ory graph is stored in a standard property graph database. Memories are structured data—summaries, metadata, tags, edges, artifact p oin ters—not model-sp eciﬁc em b eddings or token sequences. Standardized access proto col. Memory op- erations are exposed through the Model Con text Proto col (MCP) MCP [ 2025 ], a mo del-agnostic tool in terface. Any MCP-compatible agent can execute memory op erations through the same in terface. Pluggable LLM integration. Comp onents re- quiring LLM capabilities—the Dream State con- solidation pip eline, PI I redactio n, and mem- ory summarization—accept an y LLM through an adapter in terface. 5.2 Wh y Agen t W ork Pro ducts Demand Managed Memory When an LLM serv es as a c hatb ot, its “memory” is a conv enience. When an LLM op erates as an agen t p erforming consequen tial work, memory b e- comes critical infrastructure—for t wo distinct rea- sons. Op erational need: multi-agen t co ordina- tion. In autonomous pipelines, each agent’s out- put is the next agent’s input. An image genera- tion agen t produces texture revisions; a composit- ing agent m ust lo cate the appro ved revision, under- stand what created it, and link its own output bac k. Without structured asset managemen t, these hand- oﬀs require brittle ﬁle paths, naming con ven tions, or ad-ho c metadata—none of which scale to complex m ulti-agent w orkﬂows. The memory graph provides this co ordination natively: ev ery output has a URI, a v ersion history , typed dep endency edges to its in- puts, and mutable tag p ointers indicating appro v al status. Agents query the graph to ﬁnd inputs just as they query it to recall past in teractions. Go vernance need: decision auditability . An agen t that appro v ed a deplo ymen t, recom- mended a treatmen t plan, or signed oﬀ on a ﬁ- nancial mo del m ust b e able to explain why —not b y regenerating a plausible explanation, but by p oin ting to the actual evidence, the actual rea- soning chain, and the actual prior b eliefs that in- formed the decision. This is the same accountabil- it y standard applied to human work ers. The mem- ory graph pro vides this: ev ery agent b elief has a URI, a revision history , pro venance edges to source evidence, and an imm utable audit trail. The w eb dash b oard and desktop asset browser render this graph as a browseable hierarch y of versioned items (Figure 1 ). An op erator reviewing an agent’s deci- sion can trav erse from the decision memory to its Derived_Fr om sources, c heck what the agen t b e- liev ed at the time via time-indexed tag resolution, and insp ect the Dream State consolidation rep orts. Both needs—operational co ordination and gov er- nance auditability—are served b y the same graph, the same SDK, and the same API. Without this in- frastructure, agent work pro ducts are un traceable and agent pip elines are unmanageable. The system mak es agen t memory as insp ectable as a v ersion- con trolled co debase and as na vigable as a managed asset database—b ecause structurally , it is b oth. 6 System Arc hitecture 6.1 Ov erview The arc hitecture implements a dual-store mo del mirroring the human working/long-term memory distinction. The agent interacts through MCP; the memory graph is LLM-indep enden t. 10 6.2 W orking Memory: Library-Lev el A ccess The working memory la yer uses Redis, accessed via a direct library SDK (not HTTP), to maintain the curren t session’s message buﬀer with a conﬁgurable TTL (default: 1 hour). Measured latencies are 2– 5 ms via library SDK versus 150–300 ms through an HTTP gatew ay . Key design prop erties: • Latency: < 5 ms read/write via library SDK. • Scope: Session-lo cal; isolated p er (pro ject, con text, user, session) via k ey namespace cogmem:{proj}:sessions:{sid}:* . • V olatilit y: TTL-based expiry; b ounded buﬀer (default 50 messages). • Capacit y: Session-lo cal only; do es not p ersist across agen t restarts. • Isolation: Strict context-lev el isolation via key namespacing (the con text ﬁeld in the session ID serv es as the namespace b oundary). Principle 2 (Match Storage Latency to A ccess P attern) . W orking memory r e quir es libr ary-level (not RPC-level) ac c ess to maintain sub-10 ms la- tency. The 100–200 ms diﬀer enc e b etwe en in- pr o c ess and network ac c ess, c omp ounde d acr oss thousands of inter actions, ac cumulates to or ders-of- magnitude diﬀer enc es in agent r esp onsiveness. 6.3 The Structured Memory Reference Sc heme Ev ery ob ject in the system is addressable through a univ ersal URI scheme: kref://project/space/ item.kind?r=N&a=artifact The scheme provides: A ddr essability —any mem- ory is referenceable from an y context; T emp or al navigation — ?r=N pins to a sp eciﬁc p oint in time; T yp e safety —the .kind suﬃx enables t yp e-a ware retriev al; T r aversal entry p oints —edges reference memory URIs, enabling graph trav ersal from any starting p oin t. Principle 3 (Univ ersal Addressabilit y) . Every memory unit must b e r efer enc e able by a stable, p arse able, human-r e adable identiﬁer. Without ad- dr essability, e dges b e c ome fr agile, line age b e c omes untr ac e able, and c onsolidation b e c omes lossy. Principle 4 (V alidate at Boundaries, T rust In- ternally) . The SDK validates al l memory r efer enc e URIs at the b oundary (r e gex matching against the c anonic al format). Onc e validate d, internal c o de tr e ats them as op aque strings, eliminating r e dun- dant p er-function validation. 6.4 Long-T erm Memory: The Prop ert y Graph Long-term memory is stored in Neo4j. Eac h mem- ory unit is represented as an Item node with one or more Revision no des forming an imm utable version c hain. 6.4.1 The Item–Revision Mo del An Item represen ts a named, typed memory unit. Eac h Revision is an immutable snapshot carrying structured metadata (summary , topics, keyw ords, sc hema version) and optionally an embedding vec- tor. An agen t can: resolve a tag to retriev e the latest b elief; follow Supersedes c hains to trace b elief evolution; follow Derived_Fr om edges to understand eviden tial supp ort; and mov e a tag to an earlier revision to p erform b elief rollbac k. Item: "api-design.decision" Rev 1 (Jan 15): "Use REST for public API" Rev 2 (Jan 22): "Use REST + WebSocket" edge: SUPERSEDES -> Rev 1 Rev 3 (Feb 1): "Use gRPC internally" edge: SUPERSEDES -> Rev 2 edge: DERIVED_FROM -> "benchmarks.fact?r =1" Tag "current" -> Rev 3 Tag "initial" -> Rev 1 Principle 5 (Immutable Revisions, Mutable P oin t- ers) . Memory states ar e never overwritten; they ar e versione d. T ags pr ovide mutable “curr ent view” se- mantics without losing history, enabling b elief r evi- sion tr acking, audit tr ails, and r ol lb ack. 6.4.2 BYO-Storage: Metadata Ov er Con- ten t The system stores metadata, relationships, and p oin ters—never ﬁle conten t. Artifact records con- tain a lo cation ﬁeld p ointing to where raw conten t 11 T able 3: Memory type taxonom y and implementa- tion. T yp e Implemen tation W orking Redis buﬀer (TTL) Episo dic Conv ersations Seman tic Consolidated facts Pro cedural T o ol execution Asso ciativ e Graph edges Meta T ags + audit resides on the user’s o wn storage. This con ten t- reference separation—a principle w ell-established in asset managemen t systems handling petabytes of data—yields critical beneﬁts for cognitive mem- ory: the graph database stays light weigh t, pri- v acy b oundaries are enforced architecturally , and agen ts read compact summaries rather than full transcripts. Principle 6 (Metadata Over Con tent) . Stor e the minimum information ne c essary for r e c al l and r e a- soning in the cloud gr aph. R aw c ontent stays lo c al. This pr eserves user privacy, r e duc es stor age c osts, eliminates data exﬁltr ation risk, and enables c o gni- tive eﬃciency. 6.4.3 Memory Type T axonom y The system implemen ts six memory types: working memory (Redis buﬀer), episo dic memory (con ver- sation revisions), semantic memory (consolidated facts), pro cedural memory (to ol execution records), asso ciativ e memory (graph edges + bundles), and meta-memory (tag system + audit trail). 6.5 Edge System: Reasoning as First- Class Structure The edge system mak es reasoning structure ex- plicit through six t yp ed, directed edge t yp es: Depends_On (v alidit y dep endency), De- rived_Fr om (evidential pro v enance), Super- sedes (b elief revision), Referenced (asso ciative men tion), Cont ains (bundle membership), and Crea ted_Fr om (generative lineage). Edge type deﬁnitions: • Depends_On : V alidit y dep endency . If the tar- get is in v alidated, the source may b e unreliable. • Derived_Fr om : Evidential pro venance. The source w as pro duced using the target as input. • Supersedes : Belief revision. The source re- places the target as the curren t b elief. • Referenced : Associative mention. The source refers to the target without dep endency . • Cont ains : Bundle membership. The target is a mem b er of the source bundle. • Crea ted_Fr om : Generativ e lineage. The source w as generated from the target. These edges enable three trav ersal op erations: Tra verseEdges ( k , d, n ) , Shor testP a th ( k s , k t ) , and Anal yzeImp act ( k, d ) . Impact analysis is par- ticularly imp ortant for b elief revision: when an agen t disco vers that assumption A is inv alid, Ana- l yzeImp act identiﬁes all downstream conclusions that ma y need re-ev aluation. T rav ersal operations: 1. T rav erseEdges ( k , d, n ) : Find all memories con- nected to memory reference k in direction d up to depth n . 2. ShortestP ath ( k s , k t ) : Find the minimal con- nection b et ween tw o memory references. 3. AnalyzeImpact ( k , d ) : Compute the transitive dep endency cascade from memory reference k to depth d . Principle 7 (Explicit Ov er Inferred Relationships) . R elationships b etwe en memories must b e stor e d as ﬁrst-class gr aph e dges, not inferr e d at query time thr ough emb e dding similarity. Similarity ﬁnds r e- late d c ontent; e dges enc o de wh y c ontent is r elate d. Principle 8 (Why Graph-Nativ e Edges Matter) . In r elational systems, a dep endency b etwe en two as- sets r e quir e d a junction table and a multi-join query. In native gr aph datab ases, the same r elationship is a single dir e cte d e dge tr aversable in O (1) time. R e- cursive CTEs c an c ompute tr ansitive closur e over r elational joins, but they r e quir e exp onential ly mor e c omputation as depth incr e ases. A single Cypher ShortestPath query achieves the same r esult in mil lise c onds. This diﬀer enc e c omp ounds acr oss al l gr aph op er ations: tr aversal, imp act analysis, pr ove- nanc e r e c onstruction. 12 7 F ormal Prop erties: Belief Revi- sion in Graph-Nativ e Memory The item–revision–tag mo del is not merely an en- gineering con venience; its primitives corresp ond to fundamen tal op erations studied in the theory of rational b elief c hange. This section demonstrates a structural corresp ondence with the A GM postu- lates Alchourrón et al. [ 1985 ]—or, in one principled case, delib erately div erges from them. W e frame this con tribution at the level of b e- lief b ases rather than b elief sets, follo wing Hans- son Hansson [ 1999 ]. The contribution is the bridge itself: demonstrating that an indep endently- motiv ated systems arc hitecture, designed for pro- duction agent memory , satisﬁes established ratio- nalit y constrain ts under a b elief base in terpretation. W e do not propose new logical machinery; the nov- elt y lies in showing that production-motiv ated ar- c hitectural choices yield a system satisfying A GM’s p ostulates. T o our kno wledge, no prior w ork bridges A GM b elief revision theory with AI agen t mem- ory architectures. The nov elty lies in the mapping b et ween graph-nativ e memory op erations and for- mal b elief revision op erators—not in the logic it- self, whic h is delib erately kept at the prop ositional lev el for tractabilit y . The distinction b etw een b elief sets and b elief bases is fundamen tal: A GM’s origi- nal formulation op erates on b elief sets closed under logical consequence ( K = Cn( K ) ), which are inﬁ- nite ob jects unsuitable for computational systems. In contrast, computational systems store ﬁnite sets of explicitly stored prop ositions—precisely what the memory graph’s revision conten t represents. Hans- son’s belief base framework op erates on these ﬁnite, non-closed sets, pro viding the appropriate lev el of abstraction for agent memory . This framing av oids the w ell-known diﬃculties of computing deductive closure while retaining the rationality constraints that mak e AGM v aluable. 7.1 F ormal System Mo del W e formalize the memory graph and the b elief state it induces. Deﬁnition 7.1 (Memory Graph) . A memory graph is a tuple G = ( I , R, E , τ ) wher e: • I is a ﬁnite set of items (name d, typ e d memory units); • R = S i ∈ I R i is the set of al l revisions , wher e R i = ( r (1) i , r (2) i , . . . ) is the or der e d, app end-only r evision se quenc e for item i ; • E ⊆ R × EdgeT yp e × R is a set of typ e d, dir e cte d e dges b etwe en r evisions; • τ : T agName ⇀ R is a p artial function map- ping tag names to r evisions (the mutable p ointer layer). Each r evision r ( k ) i is imm utable : onc e cr e ate d, its c ontent φ ( r ( k ) i ) c annot b e mo diﬁe d. T ags ar e the sole mutable c omp onent. Deﬁnition 7.2 (Belief Base) . Given a memory gr aph G = ( I , R, E , τ ) , the b elief b ase is: B ( τ ) = [ t ∈ dom( τ ) φ  τ ( t )  (1) wher e φ ( r ) denotes the pr op ositional c ontent of r evi- sion r (op er ational ly, the r evision ’s structur e d meta- data: summary, topics, keywor ds, extr acte d facts). Unlike the AGM b elief set, B ( τ ) is a ﬁnite set that is not close d under lo gic al c onse quenc e. The system records every tag assignmen t and reassignmen t, yielding a history function τ T : T agName ⇀ R that resolves the tag mapping as it existed at time T . This induces historical belief bases: B ( τ T ) = [ t ∈ dom( τ T ) φ  τ T ( t )  (2) enabling the query “what w as believ ed under tag t at time T ?” to be answ ered precisely—not b y scanning revision timestamps, but by resolving the actual tag-to-revision binding that was active at T . An imp ortan t implementation note: the prop osi- tional conten t φ ( r ) is op erationally the revision’s structur e d metadata —its summary , topics, key- w ords, and extracted facts—not the ra w con v ersa- tion transcript or to ol output. Raw con tent resides outside the graph as an artifact and is dereferenced only when exact detail is required. This tw o-tier represen tation makes φ ( r ) compact and tractable for b elief-state computation, while preserving full eviden tial ﬁdelity through the artifact p ointer. 13 F rom natural language to ground triples. A cen tral implemen tation question is ho w agen t- generated natural language beliefs—e.g., “the user prefers co ol color tones”—are mapped to ground atoms in A t G (Deﬁnition 7.8 ). The mapping is p er- formed at the API b oundary b y the memory_ingest MCP to ol, whic h accepts structured ﬁelds: a title (b ecomes the item name), a summary (b ecomes the revision’s primary con tent), tags and topics (b e- come metadata keyw ords), and a memory_type clas- siﬁcation (summary , decision, fact, reﬂection, er- ror). The agent’s natural language output is th us decomp osed in to typed ﬁelds b efore entering the graph—the mapping is not an automated NLP ex- traction pip eline but a structur e d API c ontr act that the agen t’s skill prompt is designed to satisfy . Concretely , the agent’s statement “the user prefers co ol tones for their design palette” is mapp ed to a ground triple ⟨ c olor-pr efer enc e , summary , “prefers co ol tones” ⟩ ∈ A t G b y the agent writing: title="color-preference" , summary="prefer s cool tones for design palette" , memory_type="decision" . The triple struc- ture ( s, p, o ) arises from the item–ﬁeld–v alue decomp osition: sub ject = item name, predicate = ﬁeld name, ob ject = ﬁeld v alue. This is not seman tic parsing in the NLP sense; it is structured output that the LLM agen t pro duces b y follo wing its memory skill prompt, whic h sp eciﬁes the exp ected ﬁelds and their seman tics. Prop ert y graph, not RDF. Despite the ( s, p, o ) notation, the underlying storage is a lab ele d pr op erty gr aph (Neo4j), not an RDF triple store. Eac h memory item is a graph no de with typed prop- erties (summary , topics, k eywords, type); eac h re- vision is a separate node linked by Supersedes edges; and in ter-item relationships ( Depends_On , Derived_Fr om , etc.) are ﬁrst-class t yp ed edges with optional metadata. The “ground triple” for- malism in At G is an analytic al abstr action used to establish the A GM correspondence—it pro vides a clean prop ositional atom structure for the formal pro ofs. The implemen tation stores these atoms as no de prop erties in the prop erty graph, not as RDF sub ject–predicate–ob ject statemen ts. This distinc- tion matters b ecause the property graph mo del sup- p orts features absen t from basic RDF: p er-edge metadata, nativ e no de-level indexing (fulltext and v ector), and sc hema-ﬂexible prop erty bags on b oth no des and edges. Readers familiar with RDF should understand At G as a formalization con venience, not an arc hitectural commitment to triple stores. Three consequences follow. First, the quality of the mapping dep ends on the agen t’s skill prompt, not on a separate extraction module—prompt en- gineering is the primary con trol surface. Second, am biguous or complex b eliefs (“the user seems to prefer co ol tones but men tioned w arm tones for the b edro om”) are captured as a single revision with a nuanced summary , not decomp osed into multiple conﬂicting triples; the formal mo del handles the re- sulting comp ound belief through the mec hanisms describ ed in the partial merging discussion below. Third, the mapping is lossy b y design: the full con- v ersational context is preserved in the artifact (the ra w transcript), while the ground triple captures only the distilled belief. This lossy compression is what makes B ( τ ) tractable—a ﬁnite set of ground atoms rather than an unbounded natural language corpus—while the artifact system preserves eviden- tial ﬁdelit y for cases requiring the original context. Scop e boundary . The NL-to-triple mappin g is a pr e-formal step: the formal properties estab- lished in Section 7.2 hold o ver the ground triple represen tation At G , not o ver the natural language source. In particular, the mapping is many-to- one in pr actic e —the same natural language b elief (“the user prefers co ol tones”) could be mapp ed to diﬀeren t ground triples depending on the agen t’s c hoice of item name, summary w ording, or meta- data structure. T w o seman tically iden tical beliefs ma y therefore not b e syntactically iden tical in At G , whic h means that Extensionality ( K ∗ 6 , Proposi- tion 7.5 ) holds ov er the formal represen tation but cannot guarantee that the agen t will consistently map equiv alen t natural language inputs to equiv a- len t ground atoms. This is an inherent limitation of an y system that bridges informal and formal rep- resen tations; the consistency of the mapping is a prompt engineering concern, not a formal one. W e treat the NL-to-triple mapping as an explicit as- sumption: the formal analysis begins after b eliefs ha ve been committed to the graph as ground triples, and the qualit y of the formal guaran tees is b ounded b y the quality of this pre-formal mapping step. 14 Deﬁnition 7.3 (T wo-Tier Epistemic Mo del) . The memory system op er ates at two distinct epistemic levels: • The full graph G = ( I , R, E , τ ) c ontains al l r e- visions ever cr e ate d, including depr e c ate d items and ar chive d (untagge d) r evisions. This is the op er ator-ac c essible stor e. • The agen t retriev al surface B retr ( τ ) is the subset of the b elief state r e achable thr ough the r etrieval pip eline. A r evision r is in the r etrieval surfac e if and only if: (i) r is r efer enc e d by at le ast one active tag in τ , and (ii) the item c ontaining r is not marke d as depr e c ate d. W e deﬁne the depr e c ation pr e dic ate: deprecated : I → {⊤ , ⊥} is a b o ole an pr op erty on e ach item, set to ⊤ by the c ontr action op er ator (Deﬁnition 7.5 ) and queryable as a no de pr op erty in the gr aph datab ase ( item.deprecated ). A n item’s depr e c a- tion status is mutable (it c an b e r estor e d via explicit op er ator action) but defaults to ⊥ at cr e ation. F ormal ly, let I active = { i ∈ I | ¬ deprecated( i ) } . Then: B retr ( τ ) = [ t ∈ dom( τ ) item( τ ( t )) ∈ I active φ  τ ( t )  ⊆ B ( τ ) (3) This two-tier mo del is critic al for the Consistency p ostulate ( K ∗ 5 ). Without r etrieval exclusion, a sup erse de d r evision r ( k ) i — stil l pr esent in the gr aph and c arrying c ontent φ ( r ( k ) i ) that may en- tail ¬ A — c ould b e surfac e d by ve ctor similar- ity se ar ch (the emb e dding of the old r evision may b e semantic al ly pr oximate to the query). This would violate Consistency: the agent’s op erational b elief state would c ontain b oth A (fr om the new r evision r ( k +1) i ) and c ontent entailing ¬ A (fr om the sup erse de d r evision le ake d thr ough r etrieval). The r etrieval surfac e B retr ( τ ) pr events this by en- suring that only tag-r efer enc e d, non-depr e c ate d r e- visions ar e c andidates for any r etrieval br anch. Both r etrieval br anches—ful ltext and ve ctor—ﬁlter on active status b efor e sc oring via a mandatory WHERE NOT item.deprecated clause in the under- lying Cypher query, making the exclusion ar chite c- tur al ly enfor c e d r ather than applic ation-dep endent. The op er ator, by c ontr ast, c an always ac- c ess the ful l gr aph G via explicit opt-in ( include_deprecated=true ), enabling audit, r ol lb ack, and pr ovenanc e queries without c ompr o- mising the agent’s b elief c onsistency. Postulate sc op e. The formal p ostulates in Se ction 7.2 ar e pr ove d for the b elief base B ( τ ) (Def- inition 7.2 ), which is a deterministic function of the tag assignment τ . The r etrieval surfac e B retr ( τ ) is a subset of B ( τ ) obtaine d by ﬁltering on active (non-depr e c ate d) items. Sinc e depr e c ation is itself a c ontr action op er ation (Deﬁnition 7.5 ), the r etrieval surfac e inherits p ostulate satisfaction: any pr op erty pr ove d for B ( τ ) under the sp e ciﬁe d op er ations also holds for B retr ( τ ) under the c orr esp onding r estricte d op er ations. What the r etrieval surfac e do es not in- herit is deterministic r anking: the hybrid sc oring pip eline (Se ction 8 ) intr o duc es sc or e-b ase d r er ank- ing that may surfac e diﬀer ent subsets of B retr ( τ ) dep ending on query formulation. This aﬀe cts which b eliefs an agent enc ounters in pr actic e, but not the formal pr op erties of the underlying b elief b ase. Deﬁnition 7.4 (Graph-Native Revision) . Given b elief b ase B ( τ ) and input pr op osition A , the re- vision op er ation B ∗ A is implemente d as: 1. Cr e ate a new r evision r ( k +1) i with c ontent φ ( r ( k +1) i ) = A (or a c ontent set entailing A ); 2. A dd e dge ( r ( k +1) i , Supersedes , r ( k ) i ) to E ; 3. Up date the tag: τ ′ = τ [ t current 7→ r ( k +1) i ] . The prior r evision r ( k ) i r emains in R but is no longer tag-r efer enc e d, henc e exclude d fr om B ( τ ′ ) . Gran ularity of revision and partial merg- ing. The revision operator (Deﬁnition 7.4 ) per- forms whole-r evision r eplac ement : the entire prior revision’s con tent is arc hiv ed when a new revision sup ersedes it. This raises the question of ho w the system handles partial up dates—for example, when a revision contains b eliefs { A, B , C } and only A c hanges. Three strategies exist along a gran ular- it y sp ectrum: (i) A tomic r eplac ement (current): The agent cre- ates a new revision r ( k +1) i with conten t { A ′ , B , C } , sup erseding r ( k ) i en tirely . This preserv es the for- mal prop erties: the p ostulate proofs are clean b e- cause each revision is a complete, self-con tained b e- lief snapshot. The trade-oﬀ is that the agen t (or its memory skill prompt) must re-include unc hanged b eliefs B and C in the new revision. In practice, 15 the memory_ingest MCP to ol handles this b y ac- cepting the full up dated con tent. (ii) Finer-gr aine d atomicity : Storing one b e- lief p er item (i.e., | φ ( r ( k ) i ) | = 1 for all revisions) w ould make partial up dates trivial—revising A af- fects only the item con taining A . This increases item coun t but simpliﬁes the revision op erator to single-b elief replacemen t. The formal prop erties are preserv ed and simpliﬁed. (iii) Semantic mer ge : An LLM-p o wered merge op erator could take the old revision con tent and the new input, pro ducing a merged result that preserv es unc hanged sub-b eliefs while up dating only the con- tradictory ones. While app ealing, this introduces LLM-dep enden t non-determinism into the revision op erator, complicating the formal guarantees: the output of the merge dep ends on the LLM’s inter- pretation of “partial conﬂict,” whic h is not formally c haracterizable. The curren t architecture uses strategy (i) b y default and supp orts strategy (ii) as a deploy- men t choice. Strategy (iii) is identiﬁed as fu- ture work requiring a formally characterized merge op erator—p oten tially building on Konieczn y and Pino P érez’s Konieczn y and Pino Pérez [ 2002 ] merg- ing framework for b elief bases, which pro vides p os- tulates for m ulti-source b elief combination that could b e adapted to partial revision. Deﬁnition 7.5 (Graph-Nativ e Con traction) . Con- tr action B ÷ A is implemente d thr ough two c omple- mentary me chanisms: 1. T ag remo v al : R emove fr om τ any tag t such that A app e ars in φ ( τ ( t )) , yielding τ ′ = τ \ { t | A ∈ φ ( τ ( t )) } . 2. Soft deprecation : Mark item i as depr e c ate d. Critic al ly, depr e c ate d items ar e excluded from all searc h and retriev al op erations b y default — the agent c annot enc ounter them thr ough nor- mal r e c al l. This exclusion is the op er ational me chanism of c ontr action: fr om the agent’s p ersp e ctive, the b elief is absent fr om its ac- tive state. However, the items r emain in the gr aph and c an b e r e c over e d via an explicit opt- in ﬂag ( include_deprecated=true ), pr eserving ful l auditability. In b oth c ases, the underlying r evisions p ersist in R ; only their r e achability fr om B ( τ ) changes. The c on- tr action is thus b ehaviorally complete (the b elief vanishes fr om the agent’s r etrieval surfac e) while r emaining structurally rev ersible (the gr aph r etains the ful l r e c or d). The selection function. Deﬁnitions 7.4 and 7.5 presupp ose that the system can identify which item and revision to target when revising or contracting by a prop osition A . The AGM represen tation theorem requires a selection func- tion γ (or equiv alently an en trenchmen t ordering) that determines which b eliefs to give up during con traction Alc hourrón et al. [ 1985 ], Grov e [ 1988 ]. W e make the selection function explicit. Deﬁnition 7.6 (Selection F unction) . Given A and b ase B ( τ ) , the c ontr action tar get set is: T argets( A, τ ) = { ( t, r ) | t ∈ dom( τ ) , r = τ ( t ) , A ∈ φ ( r ) } That is, the system identiﬁes al l (tag, r evision) p airs wher e the tagge d r evision ’s c ontent explicitly c ontains the b elief A . Contr action r emoves al l such tags: τ ′ = τ \ { t | ( t, r ) ∈ T argets( A, τ ) } . This selection function is c ontent-b ase d and ex- haustive : it targets every tagged revision whose con tent directly con tains A , with no partial selec- tion or prioritization. This is the simplest selection function consistent with the Relev ance p ostulate (Prop osition 7.6 )—only revisions containing A are aﬀected, and all such revisions are aﬀected equally . The function is deterministic and computable in O ( | dom( τ ) | ) b y scanning tag-referenced revisions. Three imp ortan t consequences follow. First, when A is a ground atom (the common case), the se- lection reduces to a direct conten t lo okup—eﬃcient and unambiguous. Second, when multiple b eliefs join tly entail A but none contains A individually (e.g., B and B → A are in separate revisions), the con tent-based selection do es not contract either revision, b ecause neither revision’s con tent explic- itly con tains A . This is a delib erate design choice: the system op erates on b elief b ases , not deductiv ely closed b elief sets , so only explicitly stored prop osi- tions are contraction targets. Joint entailmen t that arises from cross-revision in teraction is outside the scop e of the contraction op erator and w ould require 16 a deductive closure step that we delib erately a void for tractabilit y . Third, for revision (Deﬁnition 7.4 ), the target item i is iden tiﬁed b y a semantic matc h- ing step—t ypically item name and kind lo okup— that precedes the formal revision op eration. In the deplo yed system, this matching is performed b y the agent’s reasoning lay er (which iden tiﬁes the relev an t memory to up date) or by the MCP to ol’s item_kref parameter (which speciﬁes the target ex- plicitly). The formal op erator assumes the target item is giv en; the selection of which item to r evise is an agen t-level decision, not a graph-lev el opera- tion. Deﬁnition 7.7 (Graph-Native Expansion) . Ex- p ansion B + A cr e ates a new r evision r ( k +1) i with φ ( r ( k +1) i ) = φ ( r ( k ) i ) ∪ { A } and assigns a tag to it, without r emoving any existing tag assignments. The r esulting b elief state is B ( τ ′ ) = B ( τ ) ∪ { A } . 7.2 P ostulate Satisfaction W e no w show that the graph-nativ e op erations satisfy the core AGM rationalit y constraints, in- terpreted at the b elief base level follo wing Hans- son Hansson [ 1999 ]. Our primary formal claim cov- ers K ∗ 2 – K ∗ 6 (the basic p ostulates) plus Relev ance and Core-Retainment (Hansson’s b elief base p ostu- lates). W e also analyze the supplementary postu- lates K ∗ 7 and K ∗ 8 , but with imp ortant cav eats detailed b elo w. Prop osition 7.1 (Success, K ∗ 2 ) . A ∈ B ∗ A . Pr o of. By Deﬁnition 7.4 , the new revision r ( k +1) i satisﬁes A ∈ φ ( r ( k +1) i ) , and tag t current p oin ts to it. Therefore A ∈ S t φ ( τ ′ ( t )) = B ( τ ′ ) . Prop osition 7.2 (Inclusion, K ∗ 3 — b elief base v ersion) . B ∗ A ⊆ B ( τ ) ∪ { A } . Pr o of. This is the b elief b ase version of Inclusion (Hansson Hansson [ 1999 ]), n ot the b elief set ver- sion ( K ∗ A ⊆ Cn( K ∪ { A } ) ). The distinction mat- ters: the base v ersion requires that no new atomic b eliefs are introduced beyond A and the surviv- ing prior beliefs. By Deﬁnition 7.4 , the revision op eration creates a new revision con taining A and ma y redirect tags, removing some prior b eliefs from B ( τ ) . No mechanism in tro duces atoms not already in B ( τ ) ∪ { A } , so the base-lev el inclusion holds. Prop osition 7.3 (V acuit y , K ∗ 4 ) . If A is c onsistent with B ( τ ) , then B ( τ ) ∪ { A } ⊆ B ∗ A . Pr o of. If A introduces no con tradiction, the revi- sion op eration has no conﬂicting b elief to retract; no tag needs redirection. The prior con ten t is pre- serv ed, augmented with A . Prop osition 7.4 (Consistency , K ∗ 5 ) . If A is c on- sistent, then B ∗ A is c onsistent. Pr o of. The revision operation replaces the tag p oin ter rather than accumulating con tradictory con tent. The prior revision (whic h ma y hav e con- tained con tent inconsistent with A ) is excluded from B ( τ ′ ) via the tw o-tier epistemic mo del (Deﬁ- nition 7.3 ). Since A is consisten t and the new re- vision is constructed to contain A without imp ort- ing con tradictory conten t, the result is consisten t, pro vided no other tagged revision contains conten t con tradicting A . F or the common case of atomic revision inputs ( A ∈ A t G ), this is guaranteed by the logical indep endence of ground atoms: distinct ground atoms α, β ∈ A t G cannot con tradict each other under propositional seman tics— ¬⟨ s 1 , p 1 , o 1 ⟩ is not entailed b y ⟨ s 2 , p 2 , o 2 ⟩ for any distinct atoms. F or comp ound revision inputs, Consistency requires that the agen t (or its revision selection logic) iden- tiﬁes al l items whose conten t conﬂicts with A , not only the primary target. This is a practical require- men t on the agen t’s conﬂict detection, not a limi- tation of the formal op erator; the operator itself in tro duces no new contradictions. Prop osition 7.5 (Extensionality , K ∗ 6 ) . If Cn G ( { A } ) = Cn G ( { B } ) , then B ∗ A = B ∗ B . Pr o of. Since the b elief base B ( τ ) consists of ground atoms from A t G (Deﬁnition 7.8 ), and ground atoms are logically independent under propositional en- tailmen t (no atom entails any other atom), tw o atoms α, β ∈ A t G satisfy Cn G ( { α } ) = Cn G ( { β } ) if and only if α = β (syn tactic iden tity). Thus, for atomic revision inputs—the normal case in the memory system—logical equiv alence reduces to syn tactic identit y , and item-level identit y c heck- ing is not an appro ximation but an exact imple- men tation. F or compound revision inputs A, B ∈ 17 L G \ At G , the revision operator dep ends on the log- ical con tent of the input (whic h b eliefs to retract, what conten t to add), so logically equiv alen t com- p ound inputs produce iden tical b elief states. F ull detection of logical equiv alence for arbitrary com- p ound formulae is co-NP-complete, but in practice the system op erates predominantly on atomic in- puts where the c heck is trivial. Prop osition 7.6 (Relev ance, K ∗ 6 ′ — Hansson) . If B ∈ B ( τ ) \ ( B ÷ A ) , then A ∈ Cn( B ′ ∪ { B } ) for some B ′ ⊆ B ( τ ) with A / ∈ Cn( B ′ ) . Pr o of. Con traction (Deﬁnition 7.5 ) remov es from B ( τ ) exactly those b eliefs residing in revisions whose con ten t con tains A (Deﬁnition 7.6 ). Supp ose B ∈ B ( τ ) \ ( B ÷ A ) ; then B w as in some revision r with A ∈ φ ( r ) , and that revision w as detagged. W e construct the required witness explicitly . Case 1: B = A . Let B ′ = ∅ . Then A / ∈ Cn( ∅ ) (since A is con tingent), and A ∈ Cn( ∅ ∪ { A } ) = Cn( { B } ) . Case 2: B  = A but B c o-o c curs with A in φ ( r ) . Let B ′ = B ( τ ) \ φ ( r ) —the b eliefs surviving after remov- ing the en tire revision’s conten t. Since A ∈ φ ( r ) and φ ( r ) was the only source of A targeted by the con tent-based selection, we ha v e A / ∈ Cn( B ′ ) (recall that B ( τ ) is a b elief b ase , not deductiv ely closed— A is present only if explicitly stored). A dding B back do es not b y itself restore A , but B was remo ved only b e c ause it co-occurred with A in a targeted revision, satisfying the Relev ance requirement that ev ery remov ed b elief ’s remov al is connected to the con tracted b elief. This is Hansson ’s relev ance p ostulate for belief base contraction, replacing AGM’s Recov ery . It en- sures that b eliefs are only remov ed during contrac- tion if they are relev an t to the contracted b elief— the system do es not gratuitously discard b eliefs dur- ing con traction. Prop osition 7.7 (Core-Retainmen t, Hansson) . If B ∈ B ( τ ) \ ( B ÷ A ) , then ther e exists B ′ ⊆ B ( τ ) such that A / ∈ Cn( B ′ ) but A ∈ Cn( B ′ ∪ { B } ) . Pr o of. Let B ∈ B ( τ ) \ ( B ÷ A ) . By Deﬁnition 7.6 , B resided in a revision r with A ∈ φ ( r ) , and that revision w as detagged during contraction. W e con- struct the witness B ′ as follo ws. Case 1: B = A . T ake B ′ = ∅ . Then A / ∈ Cn( ∅ ) and A ∈ Cn( { B } ) . Case 2: B  = A and B , A ∈ φ ( r ) . T ake B ′ = ( B ( τ ) \ φ ( r )) ∪ ( φ ( r ) \ { A, B } ) —the full b elief base min us A and B themselves. Since A was explic- itly stored in φ ( r ) and no other surviving revi- sion con tains A (the contraction was exhaustiv e), A / ∈ Cn( B ′ ) at the belief base level. Adding B do es not logically en tail A for indep endent ground atoms, but B ’s remo v al was c ause d by A ’s presence in the same revision—the structural co-o ccurrence is what makes B ’s remo v al attributable to the con- traction of A , satisfying Core-Retainmen t’s require- men t that every remov ed b elief is connected to the con tracted b elief ’s deriv ation. Core-retainmen t, another Hansson p ostulate for b elief base contraction, ensures ev ery remov ed be- lief actually contributed to deriving the contracted b elief. W e emphasize that Relev ance and Core- Retainmen t are p ostulates for b elief b ases (ﬁnite, non-deductiv ely-closed sets), not b elief sets. F or deductiv ely closed b elief sets, Core-Retainment im- plies Reco very Hansson [ 1999 ]—but this implication do es not hold for b elief bases, where our pro ofs op- erate. This distinction is essen tial: the graph’s b e- lief state B ( τ ) is a ﬁnite set of ground atoms (Deﬁni- tion 7.2 ), never deductively closed, so our sim ulta- neous satisfaction of Core-Retainment and rejection of Reco very creates no logical inconsistency . Prop osition 7.8 (Sup erexpansion, K ∗ 7 ) . B ∗ ( A ∧ B ) ⊆ ( B ∗ A ) + B . Pr o of. Conjunction is w ell-formed in L G (Deﬁni- tion 7.8 ), so A ∧ B is a v alid revision input. Re- vising b y ( A ∧ B ) creates a single revision whose con tent entails b oth A and B . The right-hand side ﬁrst revises b y A (p ossibly retracting b eliefs incon- sisten t with A ) and then expands by B (adding B without retraction). Since expansion is monotone, ( B ∗ A ) + B ⊇ B ∗ A and con tains both A and B . The conjunction revision cannot contain more than this, as it ma y also retract b eliefs inconsistent with A ∧ B . Therefore B ∗ ( A ∧ B ) ⊆ ( B ∗ A ) + B . Prop osition 7.9 (Sub expansion, K ∗ 8 ) . If ¬ B / ∈ Cn G ( B ∗ A ) , then ( B ∗ A ) + B ⊆ B ∗ ( A ∧ B ) . Pr o of. The consistency chec k ¬ B / ∈ Cn G ( B ∗ A ) is w ell-formed b ecause ¬ B ∈ L G (Deﬁnition 7.8 ). If 18 B is consistent with the A -revised state, expand- ing b y B adds no information b eyond B itself and its consequences. Revising by A ∧ B directly in- corp orates b oth conjuncts, performing an y neces- sary retractions in a single step. Since B causes no conﬂict in the A -revised state, b oth paths yield the same retractions and the same ﬁnal con tent: ( B ∗ A ) + B = B ∗ ( A ∧ B ) , and in particular ( B ∗ A ) + B ⊆ B ∗ ( A ∧ B ) . Comp ound revision inputs and op era- tional decomp osition. P ostulates K ∗ 7 and K ∗ 8 in volv e revision b y comp ound inputs A ∧ B , y et the deploy ed system’s primary op erations— memory_ingest and memory_consolidate —accept single beliefs (typically ground atoms) as inputs. This raises a legitimate question: ho w is revision b y A ∧ B realized in practice, and do es the decom- p osition in to sequen tial op erations preserve unique- ness? In the graph-native arc hitecture, compound re- vision by A ∧ B admits t wo op erational strate- gies. The ﬁrst is single-r evision enc o ding : the agent stores A ∧ B as a single revision whose con tent set is φ ( r ) = { A, B } (or { A ∧ B } treated as a comp ound form ula). This is the most direct implementation— the memory_ingest MCP to ol accepts arbitrary con tent, so a revision containing m ultiple b eliefs is w ell-formed. The formal prop erties hold immedi- ately b ecause the revision op erator acts on the con- ten t as a whole. The second strategy is se quential de c omp osition : ﬁrst revise b y A , then expand by B —i.e., com- pute ( B ∗ A ) + B . By K ∗ 7 (Sup erexpansion), B ∗ ( A ∧ B ) ⊆ ( B ∗ A ) + B , and by K ∗ 8 (Sub expan- sion), if B is consistent with B ∗ A , then ( B ∗ A ) + B ⊆ B ∗ ( A ∧ B ) , yielding equality . Th us, when B is con- sisten t with the A -revised state—the common case for non-con tradictory comp ound inputs—sequen tial decomp osition pro duces the same result as atomic comp ound revision. The order matters only when B conﬂicts with B ∗ A , in whic h case K ∗ 8 ’s an- teceden t fails and the t wo paths ma y diverge. F or the deplo yed system, where inputs are predomi- nan tly ground atoms (and distinct ground atoms are logically indep endent), this conﬂict condition do es not arise: revising by ⟨ pr ef , c o ol ⟩ and then ex- panding by ⟨ style , minimal ⟩ is equiv alen t to a single revision b y their conjunction. This analysis conﬁrms that the system’s opera- tional in terface (single-b elief op erations) is not a limitation but a practical decomp osition that pre- serv es formal guarantees in the common case, while the single-revision enco ding strategy remains av ail- able for cases requiring atomic comp ound revision. Represen tation-theoretic status of K ∗ 7 / K ∗ 8 . The AGM representation theorem (Gro ve Grov e [ 1988 ]) establishes that a revision op- erator satisﬁes all eight p ostulates ( K ∗ 2 – K ∗ 8 ) if and only if it can b e represented as a tr ansitively r e- lational partial meet con traction—equiv alently , via a total preorder on p ossible w orlds or an epistemic en trenchmen t ordering (Gärdenfors and Makin- son Gärdenfors [ 1988 ]). F or a graph-based system, this requires showing that the tag-based con trac- tion selection function implicitly enco des suc h an ordering. W e do not construct this ordering. The argu- men ts ab ov e sho w that the graph-nativ e revision op erator produces results c onsistent with K ∗ 7 and K ∗ 8 for the cases w e examine, but this falls short of a formal pro of via the representation theorem. F or- mally establishing K ∗ 7 / K ∗ 8 would require either: (a) explicit construction of an entrenc hmen t order- ing ov er graph triples and proof that contraction resp ects it, or (b) pro of that the system’s tag-based op erations are equiv alen t to a transitiv ely relational selection function. Wh y the obstacle is non-trivial. The graph mo del pro vides m ultiple natural p artial orderings o ver b eliefs, any of whic h could serve as the ba- sis for an en trenchmen t function: (i) temp or al r e- c ency —more recently created revisions are more en trenched, on the in tuition that new er b eliefs re- ﬂect up dated understanding; (ii) structur al c en- tr ality —b eliefs with higher in-degree in the De- pends_On subgraph are more entrenc hed, as they supp ort more do wnstream conclusions; (iii) c on- ﬁdenc e sc or es —the Dream State pip eline’s rele- v ance assessmen ts pro vide explicit n umeric scores. None of these is obviously canonical. T emp oral re- cency ov erv alues new b eliefs regardless of eviden- tial quality; structural centralit y ov erv alues highly- connected b eliefs ev en if their connections are weak; conﬁdence scores dep end on LLM assessmen t qual- it y , introducing a non-formal dep endency . 19 A deep er issue is that diﬀerent b elief typ es may require diﬀerent entrenc hment criteria. A prefer- ence belief (“prefers co ol tones”) should arguably b e entrenc hed by recency—the latest stated pref- erence tak es priorit y . A factual b elief (“the API runs on p ort 8080”) should be entrenc hed by eviden- tial supp ort—more Derived_From sources im- ply stronger grounding. An inferred b elief (“the deplo yment lik ely failed due to DNS”) should be en trenched b y the Dream State’s conﬁdence score. This suggests a typ e-dep endent entr enchment func- tion ≤ E : At G × A t G → { 0 , 1 } where the compar- ison criterion v aries by the kind attribute of the item con taining the b elief. Suc h type-dep endent orderings are not standard in AGM theory , though they are compatible with the Gärdenfors–Makinson en trenchmen t conditions pro vided the restriction to each type yields a total preorder. F ormally , constructing a total preorder from these multi- dimensional partial orders is an order-em b edding problem; the c hoice of aggregation function (lex- icographic priority , weigh ted com bination, Pareto dominance) determines which v arian t of the repre- sen tation theorem is satisﬁed. The primary formal claim of this pap er is there- fore satisfaction of K ∗ 2 – K ∗ 6 plus Relev ance and Core-Retainmen t—already a meaningful result for b elief base dynamics. The status of K ∗ 7 / K ∗ 8 for graph-native arc hitectures is an op en question that we intend to address in future work, p oten- tially building on Chandler and Bo oth’s Chandler and Bo oth [ 2025 ] recent extension of AGM to par- allel belief revision and Meng et al.’s Meng et al. [ 2025 ] b elief algebras for iterated revision. 7.3 In ten tional Divergence: The Reco v- ery P ostulate The A GM con traction postulate known as R e c overy states: K ⊆ ( K ÷ A ) + A (4) That is, if a b elief A is contracted from b elief set K and then immediately re-expanded, the original b e- lief set is recov ered in full. The graph-nativ e archi- tecture delib er ately violates this postulate; w e argue that this violation is a principled design decision. Wh y Reco very fails. Consider an item with revision r ( k ) i carrying conten t φ ( r ( k ) i ) = { A, B , C } , where B and C are b eliefs that were deriv ed along- side A and stored in the same revision. Contracting A via tag remo v al pro duces: B ( τ ′ ) = B ( τ ) \ φ ( r ( k ) i ) The revision r ( k ) i still exists in the graph but no longer con tributes to B . Re-expanding b y A cre- ates a new revision r ( k +1) i with φ ( r ( k +1) i ) = { A } . The beliefs B and C —which were co-lo cated with A in the original revision—are not automatically reco vered, because the new revision is constructed from the input A alone: ( B ÷ A ) + A = B ( τ ) \ φ ( r ( k ) i ) ∪ { A } ⊇ { B , C } Wh y this is correct. The failure of Reco v- ery is a direct consequence of immutable r evisions (Principle 3). In a system where con traction erases con tent, Recov ery demands that the erased conten t b e someho w reconstructable from what remains plus the re-added b elief. But in a pro venance- preserving system, con traction does not erase—it ar chives . The original revision r ( k ) i , with its full con tent, metadata, and edge relationships, remains in the graph as a historical record. Moreov er, the time-indexed tag history (Equation 2 ) means an agen t can reconstruct the exact b elief state that held before con traction b y querying B ( τ T ) for any prior time T —the system answ ers “what w as tagged decided last T uesday?” by resolving the tag bind- ing that w as activ e at that timestamp, not by at- tempting to reconstruct b eliefs from surviving con- ten t. What the system do es not do is automatically resurrect archiv ed con tent by re-adding a single b e- lief, b ecause it treats re-expansion as a fr esh inc or- p or ation of A , not a rollbac k to a prior state. This divergence aligns with well-established crit- icisms of Recov ery in the b elief revision literature. Makinson Makinson [ 1987 ] iden tiﬁed Reco very as “the only one among the six basic p ostulates that is op en to query .” Hansson Hansson [ 1991 ] and F uhrmann F uhrmann [ 1991 ] independently argued that Reco very imposes unreasonable constraints on con traction, particularly in systems where b eliefs ha ve non-trivial in ternal structure or pro venance. Our system provides a concrete operational demon- stration of their theoretical concerns: when b e- liefs carry revision history , dep endency edges, and 20 temp oral metadata, con tracting-then-expanding is not—and should not b e—a no-op. In Hansson’s belief base framework, Recov ery is not a postulate; it is replaced by Relev ance and Core-Retainmen t (Prop ositions 7.6 – 7.7 ), which our system satisﬁes. This provides additional theoreti- cal justiﬁcation for our rejection of Reco very . F or cases where full state restoration is desired, the system provides an explicit rollback mec hanism: reassigning a tag to a prior revision ( τ ′ = τ [ t 7→ r ( k ) i ] ). This is a delib er ate, auditable op eration that app ears in the tag history , distinct from the im- plicit, invisible recov ery that the A GM postulate demands. The soft deprecation mechanism (Deﬁnition 7.5 ) further illustrates why Recov ery is unnecessary . When a b elief is contracted via deprecation, it is excluded from all retriev al op erations b y default— op erationally invisible to the agent. Ho wev er, an y op erator or downstream pro cess can recov er the deprecated b elief by querying with an explicit opt- in ( include_deprecated=true ). This means con- traction is b ehavior al ly eﬀe ctive (the agen t acts as if the belief do es not exist) without being infor- mational ly destructive (the b elief can alwa ys b e insp ected or restored). Recov ery demands that re-expansion automatically reconstruct prior b e- liefs; the graph-native alternativ e pro vides some- thing stronger—full rev ersibility through explicit, auditable mechanisms that never require the sys- tem to guess what w as lost. 7.4 Iden tities and Iterated Revision T wo fundamen tal identities connect the A GM op- erations: Levi Iden tity . K ∗ A = ( K ÷ ¬ A ) + A : revision can b e decomp osed into contraction of the negation follo wed by expansion. In the graph-native mo del, this holds when con traction deterministically se- lects whic h tag-referenced revisions to de-reference: con tracting ¬ A remov es revisions whose con tent en- tails ¬ A , and the subsequent expansion by A adds a fresh revision. The tw o-step pro cess yields the same b elief state as direct revision, pro vided the con trac- tion selection function is deterministic—which it is, since tag remo v al targets sp eciﬁc revisions iden tiﬁ- able b y their conten t. Harp er Iden tity . K ÷ A = K ∩ ( K ∗ ¬ A ) : con traction can b e expressed as the intersection of the original belief set with the b elief set obtained b y revising with the negation. This holds natu- rally: any belief B ∈ K that survives con traction of A m ust b e consistent with ¬ A (otherwise it w ould logically en tail A ), and any suc h B is preserv ed in K ∗ ¬ A . The graph intersection corresp onds to the set of revisions that remain tag-referenced in both τ (original) and τ ′ (after revision b y ¬ A ). Connection to iterated revision. The Su- persedes edge c hain r (1) i ← r (2) i ← · · · ← r ( k ) i pro- vides a natural epistemic or dering o ver b elief states, corresp onding to the framework of Darwic he and P earl Darwic he and Pearl [ 1997 ] for iterated b elief revision. Eac h revision creates a new entry in this c hain, preserving the full history of b elief evolution for a giv en item. The time-indexed tag function τ T (Equation 2 ) elev ates this from structural record-k eeping to a fully query able epistemic history . The op eration τ T ( t ) —“resolv e tag t as of time T ”—enables an agen t to reconstruct the b elief state at an y his- torical momen t without scanning revision times- tamps or repla ying ev ents. F or instance, querying τ T ( decided ) returns the sp eciﬁc revision that car- ried the decided tag at time T , ev en if that tag has since b een mov ed or remov ed. Combined with the Supersedes chain, this supp orts b oth forwar d analysis (how did b eliefs evolv e?) and p oint-in- time reconstruction (what w as b eliev ed on a sp e- ciﬁc date?), pro viding the complete temp oral audit trail that the Darwic he-P earl framew ork assumes but rarely sees implemen ted. T able 4 summarizes the p ostulate satisfaction status. 7.5 W ork ed Example: Belief Revision Through Preference Up date W e illustrate the formal mac hinery with a con- crete scenario—a user preference update—stepping through each deﬁnition to sho w ho w the system cre- ates revisions, redirects tags, and propagates do wn- stream impact. Initial state. The memory graph con tains tw o items: • Item i 1 = color-pref.decision 21 T able 4: P ostulate satisfaction in the graph-native arc hitecture (Hansson b elief base). P ostulate System Mechanism Status Primary formal claims (pr ove d): K ∗ 2 Success New revision con tains A ; tag up dated ✓ K ∗ 3 Inclusion Base-level: B ∗ A ⊆ B ∪ { A } ✓ K ∗ 4 V acuit y No conﬂict ⇒ no re- traction needed ✓ K ∗ 5 Consis- tency Supersedes replaces, not accumulates ✓ K ∗ 6 Exten- sionalit y Syn tactic = logical equiv. for ground atoms ✓ Relev ance (Hansson) T ag remo v al targets relev an t revisions ✓ Core- Retainmen t (Hansson) Remo ved b eliefs con- tributed to con tracted b elief ✓ Supplementary (ar gue d, not formal ly establishe d): K ∗ 7 Sup erex- pansion Conjunction revision ⊆ sequential ✓ † K ∗ 8 Subex- pansion Consisten t expansion = conjunction revision ✓ † Intentional diver genc e: Reco very (A GM) Imm utable revisions; arc hive  = erase ✗ † Argued for the speciﬁc construction but not formally established: requires construction of an en trenchmen t ordering (Gärdenfors & Makinson 1988) or proof that graph op erations enco de a transitiv ely relational con traction (Grov e 1988), whic h we do not provide. See represen tation-theoretic discussion in text. with revision r (1) 1 , where φ ( r (1) 1 ) = {⟨ color-pref , summary , “warm tones” ⟩} . T ag t current 7→ r (1) 1 . • Item i 2 = palette.decision with revision r (1) 2 , where φ ( r (1) 2 ) = {⟨ palette , summary , “earth-tone palette” ⟩} . T ag t current 7→ r (1) 2 . • Edge: ( r (1) 2 , Depends_On , r (1) 1 ) ∈ E —the palette decision dep ends on the color preference. The b elief base is B ( τ ) = φ ( r (1) 1 ) ∪ φ ( r (1) 2 ) . Step 1: Revision (Deﬁnition 7.4 ). The user states: “A ctually , I prefer co ol tones no w.” The agen t in v okes revision with input A = ⟨ color-pref , summary , “co ol tones” ⟩ : 1. Create r (2) 1 with φ ( r (2) 1 ) = { A } . 2. A dd edge ( r (2) 1 , Supersedes , r (1) 1 ) to E . 3. Up date tag: τ ′ = τ [ t current 7→ r (2) 1 ] for item i 1 . Step 2: Belief base transition. The new b e- lief base is: B ( τ ′ ) = φ ( r (2) 1 ) ∪ φ ( r (1) 2 ) = { A } ∪ φ ( r (1) 2 ) The old b elief “warm tones” is no longer in B ( τ ′ ) — the revision r (1) 1 remains in the graph but is not tag-referenced, hence excluded from the retriev al surface B retr ( τ ′ ) per Deﬁnition 7.3 . Postulate ver- iﬁcation: Suc c ess ( A ∈ B ( τ ′ ) ✓ ), Consistency (no con tradictory conten t since the old revision is ex- cluded ✓ ), Inclusion (no new atoms b eyond A and surviving b eliefs ✓ ). Step 3: Downstream impact. The agen t (or consolidation pipeline) inv ok es Anal yzeImp act ( r (2) 1 , d =2) , whic h tra v erses incoming Depends_On edges to disco ver that r (1) 2 (the palette decision) dep ends on the no w- sup erseded revision. The impact analysis returns { r (1) 2 } as a downstream dep endent requiring p o- ten tial re-ev aluation. The agent can then decide whether to revise the palette decision (creating r (2) 2 with an up dated palette reﬂecting co ol tones) or to leav e it unc hanged if the dep endency is not materially aﬀected. Step 4: Prov enance and audit. After the revision, the graph supp orts the follo wing queries: • Curr ent b elief : Resolve τ ′ ( t current ) for i 1 → r (2) 1 → “co ol tones.” • Historic al b elief : Resolv e τ T ( t current ) for an y T b efore the up date → r (1) 1 → “w arm tones.” • Belief evolution : F ollo w the Supersedes chain: r (2) 1 Supersedes − − − − − − − → r (1) 1 . • R ol lb ack : If the preference change w as erroneous, reassign τ ′′ = τ ′ [ t current 7→ r (1) 1 ] —an auditable op eration recorded in the tag history . This example demonstrates ho w the item– revision–tag mo del handles a common agent mem- ory scenario entirely through the formal op erations of Section 7.2 , without any ad ho c logic. The same mec hanism applies to an y b elief up date— from simple preference changes to complex multi- dep endency decision revisions. A deploy ed in- stance of this mechanism, including graph state b e- fore and after revision with Supersedes edges, is 22 do cumen ted in Section 15.6 . The formal postu- lates claimed here are empirically veriﬁed by a 49- scenario compliance suite (Section 15.7 ) that tests eac h p ostulate across simple, m ulti-item, chain, temp oral, and adversarial conﬁgurations—all pass- ing at 100%. 7.6 F ormal A v oidance of the Flouris Im- p ossibilit y Flouris et al. Flouris et al. [ 2005 ] pro ved that De- scription Logics (including those underlying OWL) cannot satisfy the AGM revision postulates, and Qi et al. Qi et al. [ 2006 ] reﬁned this for speciﬁc descrip- tion logic fragmen ts. These imp ossibility results are critical con text for any system claiming AGM cor- resp ondence. W e prov e that they do not apply to our prop erty graph formalism b y showing that the formalism satisﬁes the prerequisites for A GM that description logics violate. The A GM framew ork requires a logic L = ( L, Cn) where L is a set of well-formed form ulas and Cn is a consequence op erator satisfying three proper- ties Gärdenfors [ 1988 ]: (i) inclusion : A ⊆ Cn( A ) ; (ii) monotonicity : if A ⊆ B then Cn( A ) ⊆ Cn( B ) ; (iii) idemp otenc e : Cn(Cn( A )) = Cn( A ) . A ddition- ally , A GM requires the de duction the or em : α ∈ Cn( A ∪ { β } ) iﬀ ( β → α ) ∈ Cn( A ) , and c omp act- ness : if α ∈ Cn( A ) then α ∈ Cn( A 0 ) for some ﬁnite A 0 ⊆ A . Analytical status. The logic L G deﬁned be- lo w is an analytic al fr amework used to establish that the system’s op erations satisfy the AGM p os- tulates. It is not an implemen ted query engine or reasoning system: the pro duction system do es not ev aluate propositional form ulae ov er L G , p er- form consequence closure, or c hec k en tailment at run time. The ground atoms in A t G corresp ond to no de prop erties in the Neo4j prop erty graph; the prop ositional con nectiv es and consequence op era- tor Cn G exist solely to pro vide the logical machin- ery needed for the formal proofs. This distinction is important: the formal guaran tees hold because the ar chite ctur al op er ations (revision creation, tag reassignmen t, deprecation) structurally enforce the p ostulates, not b ecause the system p erforms logical reasoning. Deﬁnition 7.8 (Memory Graph Logic) . Deﬁne the lo gic L G = ( L G , Cn G ) as fol lows. A toms. A n atomic sentenc e in L G is a gr ound triple ⟨ item , pr e dic ate , value ⟩ wher e item is a memory r efer enc e URI, pr e dic ate is one of a ﬁxe d set of pr op erty names (sum- mary, topic, keywor d, typ e, tag, e dge-typ e), and value is a string liter al. F or example, ⟨ api-design.decision , summary , “use gRPC” ⟩ asserts that the item api-design.decision c arries the summary “use gRPC.” L et A t G denote this set of gr ound atoms; e ach r evision ’s structur e d metadata φ ( r ) (Deﬁnition 7.2 ) maps to a ﬁnite subset of At G . F ul l language. L G is the closur e of A t G under the standar d pr op ositional c onne ctives {¬ , ∧ , ∨ , →} . Crucially , L G is ric her than the set of ground triples alone. Comp ound formulae such as ⟨ i 1 , p 1 , v 1 ⟩ ∧ ⟨ i 2 , p 2 , v 2 ⟩ and ¬⟨ i, p, v ⟩ ar e wel l-forme d sentenc es in L G . This is essential: the supplementary AGM p ostulates (Sup er exp ansion K ∗ 7 , Sub exp ansion K ∗ 8 ) r e quir e r evision by c onjunctions A ∧ B , and the L evi identity r e quir es ¬ A to b e a sentenc e in L G . Both ar e satisﬁe d by c onstruction. Belief b ase vs. ful l language. While L G c on- tains c omp ound formulae, the b elief base B ( τ ) (Def- inition 7.2 ) is always a ﬁnite set of gr ound atoms dr awn fr om A t G —it is not de ductively close d (fol- lowing Hansson ’s Hansson [ 1999 ] b elief b ase fr ame- work). The ful l language L G pr ovides the lo gic al app ar atus ne e de d to state and verify the p ostulates; the b elief b ase pr ovides the ﬁnite, c omputational ly tr actable r epr esentation that agents actual ly op er- ate on. This two-level distinction—a rich language for r e asoning about the b ase, a ﬂat atom set for the b ase itself—is pr e cisely the b elief b ase appr o ach. Conse quenc e. The op er ator Cn G is classic al pr op ositional closur e over L G . No c onse quenc e b eyond pr op ositional entailment op er ates on the gr aph: ther e is no tr ansitive closur e of e dge p aths, no schema-level entailment, and no rule-b ase d in- fer enc e. Gr aph tr aversal isolation. Edge tr aversal op er ations ( Tra verseEdges , Anal yzeImp act , Shor testP a th ) ar e query-time retriev al op era- tions , not lo gic al infer enc e op er ations. They c om- pute structur al r e achability over the e dge set E and r eturn r esults to the c al ling agent, but their 23 outputs never enter the b elief b ase B ( τ ) . F or- mal ly: let T rav erse( E , r , d ) denote the set of r evi- sions r e achable fr om r within depth d via e dges in E . The b elief b ase is deﬁne d exclusively by tag assign- ments (Deﬁnition 7.2 ): B ( τ ) = S t ∈ dom( τ ) φ ( τ ( t )) . T r aversal r esults ar e a function of E and the s tart- ing r evision; they do not mo dify τ and ther e- for e c annot alter B ( τ ) . The b elief b ase changes only thr ough thr e e explicit write op er ations: exp an- sion (new r evision + tag assignment), c ontr action (tag r emoval or depr e c ation), and r evision (c on- tr action fol lowe d by exp ansion). This strict sep ar a- tion ensur es that the tr ansitive closur e c omputations p erforme d by Anal yzeImp a ct —which ar e expr es- sively e quivalent to Datalo g-level infer enc e—r emain isolate d fr om the b elief r evision me chanism. Ne gation. Explicit ne gation is available in L G : for any gr ound atom α ∈ A t G , the formula ¬ α is a sentenc e in L G . This pr ovides the formal app ar atus ne e de d for the L evi identity ( K ∗ A = ( K ÷ ¬ A ) + A ) and the Consistency p ostulate. W e distinguish this formal ne gation fr om the system’s op erational close d-world assumption (CW A): at the r etrieval level, a gr ound atom not pr esent in B retr ( τ ) is tr e ate d as absent. The CW A governs agent b e- havior (what the agent tr e ats as b elieve d); formal ne gation in L G governs the lo gic al pr op erties of the r evision op er ators. This sep ar ation is analo gous to datab ase systems that use CW A op er ational ly while supp orting explicit ne gation in their query language. W e acknow le dge that the CW A intr o duc es a non- monotonic operational semantics (adding a fact to B c an invalidate pr eviously held CW A-derive d ab- senc es), but the formal r evision op er ators ar e de- ﬁne d over L G with classic al pr op ositional semantics, fol lowing R eiter’s R eiter [ 1978 ] distinction b etwe en close d-world r e asoning as a meta-level default and obje ct-level lo gic al entailment. Satisfaction system. F ol lowing A iguier et al. A iguier et al. [ 2018 ], we instantiate the satisfaction triple ( L G , M G , | = G ) as: L G is the pr op ositional language deﬁne d ab ove; M G is the set of al l truth assignments over A t G (e quivalently, subsets of the Herbr and b ase); and | = G is classic al pr op ositional satisfaction. This triple inherits the pr op erties of classic al pr op ositional lo gic, establish- ing L G as a wel l-deﬁne d satisfaction system in the sense of Aiguier et al. Prop osition 7.10 ( L G satisﬁes AGM prerequi- sites) . The memory gr aph lo gic L G satisﬁes inclu- sion, monotonicity, idemp otenc e, the de duction the- or em, and c omp actness. Pr o of. L G is a fragment of classical prop ositional logic (ground atoms with standard connectives). Classical prop ositional logic is the canonical exam- ple of a T arskian logic satisfying all ﬁv e prop er- ties Gärdenfors [ 1988 ]. Since Cn G is the restriction of classical Cn to form ulas in L G , and L G is closed under prop ositional connectiv es, all ﬁv e properties are inherited. The deduction theorem holds because L G includes → ; compactness holds b ecause prop o- sitional logic is compact. The k ey structural diﬀerences from description logics that cause the Flouris imp ossibility to fail for L G are: 1. No TBox/ABo x separation : DL maintains an in tensional lay er (TBo x, concept hierarchies) and an extensional lay er (ABox, instance asser- tions) whose interactions create non-monotonic revision pathologies. L G has a single ﬂat la yer: all facts—revision conten t, edge relationships, tag assignments—exist at the same logical level. 2. Closed-w orld op erational semantics : DL’s op en-w orld assumption means absence of a fact do es not imply its negation, complicating the In- clusion p ostulate. L G pro vides explicit formal negation ( ¬ α ∈ L G for any atom α ) while using closed-w orld semantics at the op erational lev el: facts not referenced b y an y tag are excluded from B retr ( τ ) (Deﬁnition 7.3 ), and the system treats their absence as op erationally equiv alen t to negation. The formal negation in L G ensures the Levi and Harper iden tities are w ell-formed; the op erational CW A gov erns retriev al b ehavior. 3. No complex constructors : DL constructors (disjunction, existential quantiﬁcation, role in- v erses, n umber restrictions) create closure obli- gations that conﬂict with AGM’s minimalit y requiremen ts Flouris et al. [ 2005 ]. The edge t yp es in L G are simple lab eled directed rela- tionships; closure is prop ositional, not concept- constructiv e. W e note that Aiguier et al. Aiguier et al. [ 2018 ] prop osed b elief revision via satisfaction systems as 24 a general framework accommo dating non-classical logics. Our approach is compatible: L G can b e view ed as a satisfaction system where mo dels are sets of ground atoms and satisfaction is classical. Delgrande et al. Delgrande et al. [ 2018 ] show ed that AGM-st yle revision can b e obtained with “ex- tremely little” b eyond a language with sen tences satisﬁed at mo dels, conﬁrming that our minimal construction is suﬃcient and that formal A GM re- sults o ver w eak logics are not v acuous. The restric- tion to a simple prop ositional fragment is a delib er- ate trade-oﬀ: expressiv eness for formal tractability , appropriate for agent memory , where the primary op erations are storing, retrieving, and versioning factual assertions. A GM-compliance v eriﬁcation. Flouris et al. Flouris et al. [ 2005 ] iden tiﬁed necessary conditions for a logic to b e A GM-c ompliant — admitting op erators satisfying all A GM p ostulates. These conditions include closure of the language under certain connectives. Since L G is closed under {¬ , ∧ , ∨ , →} (Deﬁnition 7.8 ), it satisﬁes these closure requiremen ts. Combined with the T arskian prop erties prov ed in Prop osition 7.10 , L G meets the Flouris et al. necessary conditions for AGM- compliance, conﬁrming that the impossibility results for description logics do not apply . The expressiv eness trade-oﬀ. The formal results ab ov e hold precisely b ecause L G is a we ak logic. A logic where every ground triple is an in- dep enden t propositional atom has essentially no in- feren tial structure—logical equiv alence reduces to syn tactic iden tity (Prop osition 7.5 ), Closure is triv- ially satisﬁed for a ﬁnite set of indep endent atoms, and the Flouris imp ossibilit y is av oided b ecause the logic lacks the complex constructors (concept disjunction, existential quan tiﬁcation, role in verses, n umber restrictions) that cause description logics to fail A GM-compliance. This w eakness is delib erate and constitutes a de- sign choice, not a limitation to b e ap ologized for. The contribution is not “A GM holds ov er a strong logic”—it is the bridge : sho wing that the sp eciﬁc arc hitectural c hoices made indep enden tly for pro- duction reasons (immutable revisions, mutable tag p oin ters, typed edges) happ en to satisfy formal ra- tionalit y p ostulates that the b elief revision com- m unity has studied for four decades. No prior agen t memory system has established this corre- sp ondence, regardless of the underlying logic’s ex- pressiv eness. The logic is simple b ecause agent memory is simple at the atomic level—storing, re- trieving, and v ersioning factual assertions, not p er- forming op en-world concept reasoning. Starting with a tractable fragment is the resp onsible sci- en tiﬁc approach; extending to richer logics (Sec- tion 16 ) is the natural next step, not a prerequi- site for the bridge result to be meaningful. How- ev er, we are explicit ab out what L G c annot express: subsumption hierarchies (“all REST preferences are API preferences”), role comp osition (“if X depends on Y and Y dep ends on Z , then X transitively de- p ends on Z ”), disjointness axioms (“preferred and deprecated are mutually exclusiv e”), and cardinal- it y constrain ts (“at most one activ e preference p er topic”). An y strengthening of L G to ward these ex- pressiv e features w ould re-encoun ter Flouris-type problems. The path to w ard richer logics—whic h w e identify as a direction for future work (Sec- tion 16 )—would likely require Aiguier et al.’s Aigu- ier et al. [ 2018 ] extended treatmen t of b elief revi- sion via satisfaction systems in non-classical logics, p oten tially targeting fragments of description log- ics that Qi et al. Qi et al. [ 2006 ] show ed are still A GM-compatible. W e ac knowledge a concern raised in p eer review regarding negation in struc- tured metadata. What is the negation of ⟨ prefs , summary , “Prefers REST APIs” ⟩ ? F or- mally , ¬⟨ prefs , summary , “Prefers REST APIs” ⟩ is a well-formed sen tence in L G asserting that this sp eciﬁc atom is false. Operationally , the system realizes this through the closed-w orld assumption: the atom’s absence from B retr ( τ ) is treated as if ¬ α holds at the retriev al surface. This dual treatment—formal negation for the logic, CW A for the agent’s operational seman tics—is w ell-precedented. Our treatmen t follo ws the Answ er Set Programming tradition: Gelfond and Lifsc hitz Gelfond and Lifsc hitz [ 1988 ] introduced stable model semantics with default negation ( not p : absence of evidence), and Gelfond and Lifsc hitz Gelfond and Lifsc hitz [ 1991 ] later in- tro duced classical (strong) negation ( ¬ p : explicit falsiﬁcation) alongside default negation, establish- ing the three-v alued epistemic state— p is true, 25 ¬ p is true, or p is unkno wn—directly relev ant to cognitiv e memory distinguishing “we don’t kno w if X ” from “w e know X is false.” Our system uses classical negation in L G for the formal apparatus and default negation op erationally at the retriev al surface, consistent with the satisfaction system deﬁned ab o ve. 7.7 Computational Complexit y Revision creation (Deﬁnition 7.4 ) requires a b ounded num b er of graph operations: one node cre- ation, one edge creation ( Supersedes ), and one tag reassignmen t. The tag reassignmen t in volv es a scan of revisions carrying the target tag— O ( k ) where k is the num b er of revisions holding that tag. In practice k = 1 due to tag uniqueness in- v arian ts (each tag p oints to exactly one revision at a time), making the op eration eﬀectiv ely con- stan t. Con traction (Deﬁnition 7.5 ) requires iden- tifying tag-referenced revisions whose conten t con- tains the contracted b elief and removing or depre- cating those tags— O ( | dom( τ ) | ) in the num b er of activ e tags, whic h is small in practice ( < 100 for t ypical deploymen ts). Expansion (Deﬁnition 7.7 ) is O (1) . Computing the b elief state B ( τ ) (Deﬁnition 7.2 ) requires collecting conten t from all tag-referenced revisions— O ( | dom( τ ) | ) —follo wed b y closure. In a full prop ositional setting, closure is exp onential; in our system, “closure” is op erationally realized through the retriev al system’s ability to surface rel- ev an t conten t via fulltext and vector search. Graph tra v ersal for pro venance and impact anal- ysis is b ounded by BFS to a conﬁgurable depth limit d (default d = 10 , range 1 – 20 ), yielding worst-case complexit y O ( b d ) for a verage branc hing factor b . In deplo yed systems with t ypical memory graphs ( b ≈ 3 – 5 ), trav ersals complete in under 100 ms. 8 Hybrid Retriev al 8.1 The Retriev al Gap Consider an agen t querying: “What shade do es the user prefer?” A fulltext search for “shade” returns no results—the relev ant memory is stored as “fa- v orite color is blue” (no lexical ov erlap). A vector similarit y search for the em b edding of “shade pref- erence” retriev es the correct memory via semantic pro ximity in em b edding space. Neither mo dalit y alone is complete; the system must combine b oth. Separately , the graph’s edge trav ersal operations (Section 6 ) enable structural navigation—e.g., ﬁnd- ing related memories via a Cont ains edge from a “user preferences” bundle—but these are exposed as explicit graph na vigation to ols ( get_dependencies , get_edges , find_path ), not as an implicit retriev al signal within the searc h pip eline. 8.2 T w o-Branch Hybrid Retriev al Pip eline Giv en a query q , the hybrid retriev al pip eline com- bines tw o scoring signals within a single database query: 1. F ulltext : BM25-scored fulltext index query with Lucene. Query terms are sanitized (sp ecial c haracters escap ed) and augmented with Lev en- sh tein fuzzy matching (edit distance 1 for terms > 2 characters). A v ailable on all tiers. 2. V ector Similarity : The query is em b edded via a conﬁgurable em b edding mo del (default: 1536- dimensional text em b eddings). Cosine similarit y is computed against revision em b edding vectors using the database’s native vector index. A v ail- able on higher tiers where embedding infrastruc- ture is pro visioned. The tw o branches are com bined via UNION ALL within a single Cypher query , av oiding the o ver- head of multiple database round-trips. This is more eﬃcien t than async parallel execution for t wo- branc h fusion, as it eliminates co ordination o ver- head while leveraging the database engine’s in ternal parallelism. Let R ℓ ( q ) and R v ( q ) denote the rank ed result sets returned b y the fulltext and v ector branc hes resp ectiv ely for query q . F or each candidate mem- ory m app earing in either branch, w e deﬁne branch- sp eciﬁc scores: s ℓ ( m ) = BM25 ( q , m ) (5) s v ( m ) = β · cos  e ( q ) , e ( m )  (6) where e ( · ) denotes the embedding function and β = 0 . 85 is a calibration factor that balances cosine 26 similarit y scores against the BM25 scoring range. A t yp e-aw are w eight reﬂects the structural precision of eac h match: w ( m ) =      1 . 0 if m is an item match 0 . 9 if m is a revision match 0 . 8 if m is an artifact match (7) The ﬁnal merged score for eac h unique memory is: S ( q , m ) = w ( m ) · max  s ℓ ( m ) , s v ( m )  (8) Score calibration is ac hieved through the source- sp eciﬁc w eighting factor ( β = 0 . 85 for vector simi- larit y) and type-aw are m ultipliers, rather than p er- query normalization. Max-based fusion selects the single strongest signal for each candidate, unlik e re- cipro cal rank fusion (RRF), whic h sums o ver rank p ositions, or linear combination metho ds, which compute w eighted av erages. This is a deliberate design c hoice: when a memory matc hes strongly on one mo dalit y (e.g., exact lexical matc h), a w eak score from another mo dalit y (e.g., low cosine simi- larit y due to v o cabulary mismatch) should not di- lute the result. Graph na vigation as a complemen- tary capabilit y . Edge trav ersal op erations ( Tra verseEdges , Anal yzeImp act , Shor t- estP a th ) are av ailable as explicit MCP to ols that agents inv ok e when structural reasoning is needed—e.g., “what dep ends on this decision?” or “trace the pro venance of this conclusion.” These op erations complement the searc h pip eline b y pro- viding structur al retriev al paths that neither lexical nor seman tic similarity can discov er. How ev er, they are agen t-initiated navigation op erations, not automatic signals fused in to the scoring function. Hyp erparameter disclosure. The scoring function con tains heuristic defaults that hav e not b een empirically optimized: (i) the calibration fac- tor β = 0 . 85 balances cosine similarity against BM25 scores; this v alue was chosen based on ob- serv ed score distributions in the deplo y ed system but has not been v alidated via sensitivity analy- sis; (ii) the type-aw are w eights (1 . 0 , 0 . 9 , 0 . 8) for item/revision/artifact matc hes reﬂect a structural precision prior but are not empirically calibrated; (iii) the c hoice of max-based fusion o v er RRF or con vex combination is argumen tatively motiv ated but Bruch et al. Bruc h et al. [ 2023 ] hav e shown em- pirically that con vex combination can outp erform b oth RRF and simpler metho ds. W e plan sensitiv- it y analyses for β ∈ [0 . 5 , 1 . 0] and the weigh t v ector, as w ell as comparison against RRF and con vex com- bination on the same retriev al tasks (Section 16 ). Similarly , the Dream State circuit breaker threshold of 50% maximum deprecation p er batc h is a safety- motiv ated conserv ative default; Section 16 describes planned analysis v arying this threshold from 10% to 90% on synthetic graphs with ground-truth rel- ev ance lab els. The resp onse includes a search_mode indicator (“fulltext” or “hybrid”), making the retriev al strat- egy transparent to the calling agent. This trans- parency allows agen ts to adjust their conﬁdence in results based on the mo dality that pro duced them. Design note: Defense in depth at query b ound- aries. User-provided search queries are untrusted input. Sanitization at the query construction lay er prev ents injection in to the fulltext index, indepen- den t of any upstream v alidation. 8.3 Em b edding Generation Em b eddings are generated async hronously after eac h revision creation via a ﬁre-and-forget back- ground task. The embedding API call (t ypically 100–500 ms) do es not block the primary write path; on failure, a warning is logged and the revision re- mains v alid and fulltext-searchable. Principle 9 (Non-Blocking Enhancement) . En- richment op er ations (emb e ddings, summaries, clas- siﬁc ations) must never blo ck the primary write p ath. A memory that takes 500 ms to stor e b e c ause of an emb e dding API c al l is a memory that agents wil l avoid storing. A k ey architectural decision is that em b eddings are generated via direct API calls from the serv er pro cess, not through database v endor plugins. This ensures operation on an y database tier (including free), preserves em b edding provider ﬂexibilit y (con- ﬁgurable per deploymen t region and access tier), and supp orts creden tial rotation without service re- deplo yment. Design note: Infr astructur e indep endenc e. Core capabilities m ust not dep end on vendor-speciﬁc plu- 27 gins or premium tiers. Direct API integration pre- serv es provider ﬂexibilit y and reduces infrastructure lo c k-in. 8.4 Em b edding Con ten t Construction Eac h revision’s em b edding is computed o ver a comp osite text ﬁeld ( _search_text ) constructed serv er-side from: • Item name and kind (structural context). • Revision summary (semantic conten t). • Keyw ords and topics from metadata (domain sig- nals). • A client-pro vided embedding_text o verride (when the clien t has b etter context than the serv er can infer). This composite construction ensures that em b ed- dings capture b oth the con ten t and the structural con text of eac h memory , impro ving retriev al rele- v ance compared to embedding only the raw text. 8.5 Retriev al Design Observ ations The preceding subsections describ ed the h ybrid re- triev al pip eline and its scoring function. This sub- section characterizes three design prop erties of the retriev al system. W e distinguish these from the for- mal A GM contribution (Section 7 ), whic h consti- tutes the pap er’s primary theoretical result. The A GM corresp ondence is the formal contribution es- tablishing correctness of b elief c hange; the retriev al prop erties b elow are engine ering observ ations that justify the multi-modal design; and the cost scaling argumen t (Section 3 ) is the motivating observation for retriev al-based memory o ver con text-windo w ex- tension. 8.5.1 Co verage Complemen tarity W e ﬁrst observ e that the tw o retriev al mo dalities ha ve complemen tary failure mo des, making their com bination structurally necessary for high recall. Separately , the graph’s explicit edge tra versal op- erations provide a third, agen t-initiated retriev al path wa y that complements the search pip eline. Deﬁnition 8.1 (Mo dality-Speciﬁc Recall) . L et M b e a memory c orpus and q a query with gr ound-truth r elevant set G ( q ) ⊆ M . F or a r etrieval br anch b ∈ { ℓ, v } (ful ltext, ve ctor), let R b ( q ) ⊆ M denote the set of memories r etrieve d by br anch b . The recall of br anch b is: Recall b ( q ) = |R b ( q ) ∩ G ( q ) | |G ( q ) | (9) The hybrid recall under the union of b oth br anches is: Recall H ( q ) = |  R ℓ ( q ) ∪ R v ( q )  ∩ G ( q ) | |G ( q ) | (10) Observ ation 1 ( Cover age Complementarity ). That hybri d recall Recall H ( q ) ≥ max b Recall b ( q ) follo ws trivially from set union—any fusion metho d that unions result sets has this prop ert y . The non-trivial design claim is that eac h modality cov- ers cases the other systematic al ly misses , making the combination structurally necessary rather than merely additive. W e illustrate with tw o witness scenarios from the deplo yed system, plus a third scenario demonstrating the complementary role of graph na vigation: F ul ltext-unique : A memory containing the acron ym “HNSW” is retrieved b y BM25 for the query “HNSW conﬁguration” via exact lexical matc h, but the em b edding mo del maps the tech- nical acron ym to a generic embedding region. V e ctor-unique : A memory “fav orite color is blue” is retrieved for “What shade do es the user prefer?” via semantic embedding pro ximity , despite zero lex- ical o verlap. Gr aph-navigation (agent-initiate d) : A deploy- men t decision connected via a Depends_On c hain to a build failure is surfaced for “Wh y did the build fail?” when the agen t explicitly in- v okes get_dependencies or analyze_impact . This structural reasoning requires multi-hop trav ersal that neither lexical nor seman tic similarit y can p er- form, demonstrating wh y graph na vigation op era- tions complemen t the search pip eline. These are not contriv ed edge cases; they reﬂect the well-documented failure mo des of lexical re- triev al (vocabulary mismatch) and dense retriev al (rare terms, out-of-distribution inputs). Empiri- cal v alidation of the cov erage impro v ement on Lo- CoMo and LongMemEv al, including per-mo dality ablation, is planned (Section 16 ). 28 8.5.2 Precision Preserv ation under Max- F usion As the memory corpus grows, a key concern is whether multi-modal fusion in tro duces false p osi- tiv es that degrade precision. W e argue that max- based fusion (Equation 8 ) pro vides a precision- preserving prop ert y , though w e emphasize this is a design-lev el argument, not a formal IR result. Deﬁnition 8.2 (Branc h Precision) . F or br anch b and query q , let R k b ( q ) denote the top- k r esults r anke d by br anch-sp e ciﬁc sc or e s b . The precision at k is: P b @ k ( q ) = |R k b ( q ) ∩ G ( q ) | k (11) L et R k H ( q ) denote the top- k r esults r anke d by the mer ge d sc or e S ( q , m ) fr om Equation 8 . The hybrid pr e cision is P H @ k ( q ) = |R k H ( q ) ∩ G ( q ) | /k . Observ ation 8.1 (Precision Preserv ation under Max-F usion) . F or any query q and cutoﬀ k , assum- ing uniform typ e weights within e ach match c ate gory and wel l-c alibr ate d br anch sc or es: P H @ k ( q ) ≥ max b ∈{ ℓ,v } P b @ k ( q ) (12) That is, the hybrid r anking never has lower pr e ci- sion than the b est individual br anch, pr ovide d sc or e c alibr ation is ade quate. A r gument. The merged score S ( q , m ) = w ( m ) · max( s ℓ ( m ) , s v ( m )) assigns each memory a score at least as high as its strongest branch-speciﬁc score, with t yp e w eights w ( m ) applied uniformly within eac h matc h category (preserving relative ordering). Let b ∗ = arg max b P b @ k ( q ) b e the b est-p erforming branc h with top- k set R k b ∗ ( q ) . F or any m ∈ R k b ∗ ( q ) : S ( q , m ) ≥ w ( m ) · s b ∗ ( m ) . A memory m ′ / ∈ R k b ∗ ( q ) can displace m from the top- k of S only if m ′ scores highly on a diﬀer ent branc h—but then m ′ is a genuinely strong matc h on some mo dality . The top- k set under S there- fore con tains at least as many true positives as R k b ∗ : (i) true positives from b ∗ retain high merged scores, and (ii) any displacing memory carries strong signal from another mo dalit y . Cave ats. This argumen t assumes adequate score calibration across branches. Com b- MAX is known to b e susceptible to noise from po orly-calibrated retriev ers pro ducing inﬂated scores Bruch et al. [ 2023 ]: a single branc h with in- ﬂated scores can dominate the ranking regardless of result quality . The β = 0 . 85 calibration fac- tor (Equation 6 ) partially mitigates this for v ec- tor scores, but has not b een empirically v alidated. W e ackno wledge that the c hoice of Com bMAX o ver alternativ es (RRF, conv ex combination, learned fu- sion) is a design de cision motiv ated b y implemen ta- tion simplicit y and the precision preserv ation prop- ert y ab o ve, not by empirical sup eriority on our sp e- ciﬁc score distributions. Bruc h et al. Bruch et al. [ 2023 ] show ed that con vex combination can out- p erform b oth RRF and simpler fusion metho ds on standard IR b enc hmarks; it is plausible that an alternativ e fusion strategy would yield b etter re- triev al quality for our workload. Sensitivit y analy- sis o ver β , the type-aw are w eights, and alternativ e fusion functions (including RRF and learned com- bination) is planned as future w ork (Section 16 ). In the k -b ounded case, displacement eﬀects can o ccur—a newly added irrelev ant memory with a spuriously high score could push a true p ositive b e- lo w the top- k threshold. The precision preserv ation argumen t (Observ ation 8.1 ) mitigates this: max- fusion ensures true p ositives with strong signals on an y branch resist displacement. 8.5.3 Non-Degradation under Corpus Gro wth Observ ation 2 ( R e c al l Non-De gr adation ). In the un b ounded retriev al case (b efore k -cutoﬀ ranking), adding memories to the corpus cannot reduce the set of retrieved true p ositives: previously retrieved memories remain indexed and retriev able, while new relev an t memories ma y b e disco vered. This is a prop erty of an y non-destructive index, not sp e- ciﬁc to ou r arc hitecture. In the k -b ounded case, displacemen t eﬀects can o ccur—a newly added ir- relev an t memory with a spuriously high score could push a true p ositiv e b elow the top- k threshold. The precision preserv ation argumen t (Observ ation 8.1 ) mitigates this: max-fusion ensures true p ositives with strong signals on an y branc h resist displace- men t. Empirical measurement of recall stability under corpus gro wth is planned (Section 16 ). Summary . The A GM b elief revision corre- 29 sp ondence (Section 7 ) is the primary formal con- tribution of this pap er, establishing that the ar- c hitecture’s b elief change operations satisfy estab- lished rationalit y constraints. The retriev al prop- erties ab ov e complement this with engineering jus- tiﬁcation: cov erage complementarit y motiv ates the m ulti-mo dal design, the precision preserv ation ar- gumen t supp orts the max-fusion design choice, and non-degradation provides a basic scaling assur- ance. T ogether with the cost scaling argumen t (Sec- tion 3 ), these results supp ort the ov erall thesis that retriev al-based memory is a sound alternative to con text-window extension. Rigorous empirical v ali- dation on LoCoMo and LongMemEv al remains the critical next step. 8.6 Retriev al Seman tics Under Belief Revision The preceding subsections described what the re- triev al pipeline computes; this subsection sp eciﬁes ho w it interacts with the formal revision la yer— sp eciﬁcally , how deprecated, sup erseded, and active b eliefs are handled during query execution. Deprecated revisions. Both retriev al branc hes (fulltext and v ector) apply a mandatory ﬁlter: only revisions b elonging to non-deprecated items ( I active , Deﬁnition 7.3 ) are candidates for scoring. This ﬁlter is enforced at the Cypher query level (a WHERE NOT item.deprecated clause), not at the application lev el, making it arc hitecturally guaranteed rather than conv en tion- dep enden t. The consequence is that contraction via deprecation (Deﬁnition 7.5 ) immediately and completely remov es b eliefs from the agent’s re- triev al surface. Neither retriev al branc h—including v ector similarit y , whic h could otherwise surface seman tically pro ximate deprecated con tent—can return deprecated items without the explicit include_deprecated=true ﬂag. Sup erseded revisions. A sup erseded revision—one for which a newer revision exists with a Supersedes edge—is not automatically ex- cluded from retriev al. The retriev al surface B retr ( τ ) is deﬁned b y tag assignmen ts and deprecation status, not by the Supersedes edge structure. If a superseded revision still carries an activ e tag (e.g., initial , v1 ), it remains retriev able. This is b y design: an agen t ma y legitimately need to recall what was original ly b elieved (“what was the initial API design decision?”) or to compare past and presen t beliefs. Restricting retriev al to only the latest revision p er Supersedes chain would lose this temp oral query capabilit y . Conﬂict presentation. When b oth a current and a superseded b elief app ear in retriev al results (as observed in the color preference case study , Sec- tion 15 ), the retriev al pip eline returns b oth with their resp ectiv e scores. The system do es not auto- matically resolv e the conﬂict; instead, it provides the agent with temp oral metadata (creation times- tamps, revision n umbers) that the agent’s reason- ing la yer uses to apply recency preference. This separation is delib erate: the retriev al system’s role is to surface relev ant b eliefs; the agen t’s role is to reason ab out whic h belief to adopt. Incorp orating temp oral recency as a third retriev al signal—biasing scores tow ard more recent revisions—is planned as a future enhancemen t (Section 16 ) but is k ept sep- arate from the formal retriev al scoring to main tain clean separation of concerns. Op erator audit queries. The include_deprecated=true ﬂag expands the retriev al surface from B retr ( τ ) to the full graph G , enabling op erators to insp ect deprecated and arc hived beliefs for audit, compliance, or rollbac k purp oses. This ﬂag is never set during normal agen t recall; it is exclusively an operator-level capabilit y . This design closes the lo op b et ween the formal revision lay er (Section 7 ) and the retriev al pip eline: b elief revision op erations (revision, contraction, ex- pansion) mo dify the tag assignment τ and dep- recation status, which deterministically deﬁne the retriev al surface B retr ( τ ) , which in turn b ounds what the agent can encounter through any retriev al mo dalit y . 8.7 Bridging Sym b olic and Sub- Sym b olic Represen tations The formal mo del (Section 7 ) op erates ov er a sym- b olic b elief base B ( τ ) of ground triples, while the retriev al pip eline (Section 8 ) relies heavily on dense v ector em b eddings and LLM-generated natural lan- guage summaries. This raises a question: how do 30 the sym b olic graph la yer and the sub-sym b olic v ec- tor/LLM lay er interact, and where do es the b ound- ary lie? The arc hitecture main tains a clear division of re- sp onsibilities. The gr aph layer (Neo4j) is the sys- tem of record for b elief state: items, revisions, tags, edges, and deprecation status deﬁne the formal structures B ( τ ) , τ , and E . All b elief revision op- erations (Deﬁnition 7.4 – 7.7 ) act on this lay er. The ve ctor layer (embeddings stored as revision prop er- ties) is a deriv ed index: eac h revision’s embedding is computed from its conten t after creation and serves exclusiv ely as a retriev al accelerator. Critically , the v ector la yer nev er mo diﬁes b elief state—an embed- ding cannot create, sup ersede, or deprecate a revi- sion. The formal prop erties of Section 7 hold inde- p enden tly of whether em b eddings are presen t, ab- sen t, or stale. F rom v ectors to revision pointers. When the v ector retriev al branc h identiﬁes a seman- tically relev an t embedding, the result is not a ra w embedding v ector but a r evision kr ef —a t yp ed p ointer bac k in to the graph. Concretely , eac h em b edding v ector is stored as a prop ert y on its corresponding Revision no de; a v ector similarit y query returns the no de itself, from whic h the system extracts the revision’s kref (e.g., kref://project/space/item.kind?r=3 ). This p oin ter is then resolv ed through the graph lay er to obtain the revision’s con tent φ ( r ) , metadata, and edge relationships. The vector space thus functions as an alternative addr essing me chanism —a w a y to lo cate revisions by semantic proximit y rather than b y structural path—but the addressed ob ject is al- w ays a graph-nativ e en tity sub ject to the formal b elief revision seman tics. F rom LLM outputs to graph op erations. The system ingests LLM-generated conten t at tw o p oin ts: (i) during memory stor age , where an agen t’s natural language output is capture d as a revision’s summary , tags, and metadata via the memory_ingest MCP to ol; and (ii) during Dr e am State c onsolidation (Section 9 ), where an LLM as- sesses existing memories and recommends depreca- tion, enrichmen t, or relationship creation. In both cases, the LLM output passes through a structured API b oundary that maps it to graph op erations: a summary string b ecomes a revision’s summary ﬁeld (part of φ ( r ) ), a deprecation recommendation be- comes a contraction op eration (Deﬁnition 7.5 ), and a relationship recommendation b ecomes an edge in E . The LLM nev er directly manipulates the graph; it produces structured recommendations that are v alidated, ﬁltered b y safet y guards (Section 9.5 ), and then executed as formally deﬁned op erations. F ormal status. The bridge b etw een sym b olic and sub-symbolic lay ers is delib erately asymmetric: the v ector/LLM la yer r e ads fr om and writes thr ough the graph la yer, but cannot bypass it. This ensures that the formal prop erties established in Section 7 are preserved regardless of the quality or a v ailabil- it y of embeddings and LLM assessmen ts. A de- plo yment with no embedding infrastructure retains all formal guaran tees (at the cost of reduced re- triev al recall); a deploymen t with a malfunctioning LLM assessmen t mo dule is contained by the safet y guards. The sub-symbolic comp onents enhance the system’s practical utilit y without en tering the for- mal trust b oundary . 8.8 Clien t-Side LLM Reranking When recall returns stac ked items—m ultiple con- v ersation revisions ab out the same topic (e.g., dif- feren t sessions discussing the same p erson)—the system must select the most relev an t sibling revi- sion for the curren t query . Rather than adding a dedicated serv er-side reranking model, Kumiho del- egates this selection to the c onsuming agent’s own LLM through a tw o-stage ﬁltering pip eline. Stage 1: Embedding pre-ﬁlter. Sibling re- visions are ﬁltered b y embedding cosine similarity to the query using text-embedding-3-small with a threshold of 0.30. This remo v es ob viously irrele- v an t siblings at negligible cost ( < $0.001 p er query), reducing the candidate set b efore the more expen- siv e LLM ev aluation. Stage 2: LLM reranking. The surviving sib- lings are presented to the LLM with structured metadata—title, summary , extracted facts, en ti- ties, even ts, and implications—alongside the orig- inal query . The LLM ev aluates eac h sibling’s rele- v ance and selects the best match(es) for the curren t con text. Three conﬁguration mo des accommodate diﬀer- en t deploymen t contexts: 31 T able 5: LLM reranking conﬁguration mo des. Mo de Con text Rerank er Cost client Agent via MCP Host agent LLM Zero dedicated API / Pla yground User’s mo del User’s key auto Any Detect context Adaptiv e In the client mo de—the primary deploymen t path for MCP-in tegrated agen ts—the host agent (Claude, GPT-4o, etc.) p erforms reranking as part of its normal resp onse generation. The mem- ory lay er returns structured sibling metadata; the agen t ev aluates it alongside the con versation con- text. This is mor e ac cur ate than any standalone rerank er b ecause a frontier mo del with full con ver- sation con text outp erforms a ligh tw eight mo del op- erating on a query string alone. Critically , this costs nothing: the reranking is subsumed in to the agen t’s existing inference call. This design embo dies the LLM-Decoupled Mem- ory principle (Section 5 ): the memory la y er pro- vides structured data; the consumer’s o wn in telli- gence performs selection. As agen t mo dels impro ve, reranking quality improv es automatically without an y system changes. The memory architecture nev er needs to upgrade its reranker—it delegates to whatev er mo del the consumer is already running. Empirical impact. On LoCoMo-Plus goal- t yp e questions (Section 15.3 ), retriev al hit rate impro ved from ∼ 0% to 100% after tw o changes: (i) including the primary (published) revision in the sibling candidate list so the LLM ev aluates al l revisions, not just non-published siblings; and (ii) expanding the structured metadata presented to the rerank er to include extracted facts, en tities, ev ents, and implications. The ﬁx was structural, not mo del-dep endent: b oth GPT-4o and GPT-4o- mini ac hieved the same retriev al improv emen t. 9 The Dream State: Asyn- c hronous Consolidation 9.1 Motiv ation and Prior Art During sleep, the h uman brain replays recen t ex- p eriences, extracts patterns, consolidates episo dic memories in to semantic knowledge, and prunes re- dundan t connections Rasch and Born [ 2013 ]. The Dream State service mirrors this pro cess for AI agen t memory , conv erting a growing collection of raw episo dic memories in to a cleaner, b etter- organized memory graph that enables faster and more accurate retriev al. Prior art. The idea of asynchronous bac k- ground consolidation is not new. Letta’s sleep-time compute [ 2025 ] explicitly implemen ts bac kground agen ts that reorganize memory during idle p erio ds, dra wing the same biological metaphor. Go ogle’s V ertex AI Memory Bank and Amazon’s Bedrock Agen tCore Memory b oth perform asynchronous bac kground extraction. Our contribution is not the concept of oﬄine consolidation but the safety ar- chite ctur e surrounding it: the sp eciﬁc mec hanisms describ ed in Section 9.5 —published-item protec- tion, circuit breakers, dry-run v alidation, cursor- based resumption, and auditable rep orts—are, to our knowledge, absen t from prior consolidation sys- tems and address the critical question of what hap- p ens when automated memory managemen t goes wrong. 9.2 Ev en t-Driv en Arc hitecture The Dream State pro cesses the system’s ev en t stream with cursor-based semantics. Ev ents include revision.created , edge.created , and revision.deprecated . The cursor p osition is p er- sisted on a dedicated in ternal item ( _dream_state ), enabling resume after interruption. On ﬁrst run (no cursor), the system repla ys a v ailable history from the b eginning. T rigger mec hanisms include: sc heduled execution (e.g., nightly at a conﬁgured hour); even t cursor idle detection; memory coun t threshold; and ex- plicit API inv o cation (including through MCP as the memory_dream_state to ol). 9.3 Nine-Stage Consolidation Pip eline The pip eline proceeds through the follo wing stages: 1. Ensure Cursor : Create the internal _dream_state space and item if they do not exist. 2. Load Cursor : Read the p ersisted cursor p osi- tion; None on ﬁrst run. 32 3. Collect Ev en ts : Stream even ts from the cur- sor position, grouping by item and deduplicating (latest revision p er item wins). 4. F etch Revisions : Batc h-load revision meta- data from the graph, ﬁltering to episo dic memories ( kind=conversation ) and excluding already-deprecated items. 5. Insp ect Bundles : F or bundle-related ev en ts, fetc h current mem b ership lists to provide topical grouping con text. 6. LLM Assessmen t : Submit batches of memo- ries (conﬁgurable batch size, default 20) to an LLM with a structured prompt requesting: rele- v ance scoring (0.0–1.0), deprecation recommen- dations with reasons, tag suggestions for im- pro ved retriev al, metadata corrections or enric h- men ts, and relationship identiﬁcation b etw een memories in the batc h. 7. Apply A ctions : Execute the LLM’s recommen- dations under safet y guards (Section 9.5 ). 8. Sa ve Cursor : Persist the new cursor p osition and timestamp. 9. Generate Rep ort : Create a revision on the _dream_state item with a detailed Markdown audit rep ort as an artifact, documenting all ac- tions tak en, skipp ed, and failed. 9.4 LLM Assessmen t Proto col The assessment prompt instructs the LLM to an- alyze each memory and return structured JSON with p er-memory assessmen ts. F or eac h memory , the LLM ev aluates: • Relev ance : How useful is this memory for future agen t interactions? (0.0–1.0) • Deprecation : Should this memory b e depre- cated? Criteria include: duplicates of newer memories, superseded information, trivially ob- vious conten t, and conten t with no actionable v alue. Deprecation implements the formal con- traction op eration (Deﬁnition 7.5 ): deprecated items are excluded from all search and retriev al b y default, eﬀectiv ely remo ving the belief from the agent’s active state while preserving it in the graph for audit or explicit reco very . • Enric hmen t : What tags, k eywords, or meta- data corrections w ould improv e future retriev al? • Relationships : Whic h other memories in the T able 6: Dream State safety guards. Guard Mec hanism Dry Run Assessmen t-only mo de Published Protect Never deprecate “published” Circuit Breaker Max 50% deprecation Error Isolate P er-action try/except Audit Rep ort Markdo wn rep ort artifact Cursor Persist Resume from c heckpoint batc h relate to this one, and what is the relation- ship t yp e? The LLM is explicitly instructed to b e conserv a- tiv e: when in doubt, k eep the memory . This bias to- w ard preserv ation is reinforced b y the safety guards describ ed b elo w. F ormal status of consolidation. W e clarify the relationship b et ween online memory operations (Section 7 ) and the Dream State’s oﬄine consoli- dation. The nine-stage pip eline is an engine ering con tribution, not a formal one. Each individual action maps to a formally deﬁned operation: dep- recation is con traction (Deﬁnition 7.5 ), metadata enric hment is expansion (Deﬁnition 7.7 ), and rela- tionship creation adds edges to E . Ho wev er, we do not claim that the comp osition of a batc h of suc h actions across multiple memories preserves all A GM p ostulates sim ultaneously—proving comp o- sitional preserv ation of rationality p ostulates under sequen tial application is an op en problem in b elief revision theory . What the safety guards provide is not formal guarantees but op erational constraints (published-item immunit y , circuit breakers, dry-run v alidation) that limit the damage from incorrect LLM assessmen ts. The formal properties of Sec- tion 7 apply to individual memory operations; the Dream State’s con tribution is the safet y architec- ture surrounding their automated application. 9.5 Safet y Guards Automated memory management requires conser- v ativ e defaults. T able 6 enumerates the safet y mec hanisms. The circuit breaker is particularly important: if the LLM recommends deprecating more than 50% of assessed memories in a single run, this likely in- dicates a miscalibrated prompt or adv ersarial input 33 rather than a gen uine consolidation need. The sys- tem caps deprecations at 50% and logs a w arning for op erator review. Threshold justiﬁcation. The 50% circuit break er is motiv ated b y the observ ation that in a healthy , regularly-consolidated memory graph, eac h run should b e incr emental : a small fraction of assessed memories will be genuinely redundant or outdated. Three scenarios can trigger a high deprecation rate: (a) a miscalibrated assessment prompt (the LLM is ov er-aggressive), (b) adversar- ial or corrupted input to the assessmen t, or (c) a long-o verdue consolidation on a hea vily stale graph where man y memories ha ve b een superseded. In cases (a) and (b), halting is the correct resp onse. In case (c), the system con v erges through m ulti- ple successive runs, eac h pruning up to 50% of the remaining candidates—geometrically approaching a clean state without risking catastrophic single-run loss. Published-item protection justiﬁcation. Published items represent b eliefs that hav e b een explicitly v alidated by a human op erator or an up- stream approv al pro cess—analogous to “appro ved” assets in pro duction pipelines. Automated consol- idation should never ov erride explicit h uman v al- idation; this is a direct application of the princi- ple that h uman-in-the-lo op decisions outrank au- tomated assessments. The protection ensures that the Dream State cannot silen tly degrade curated, high-conﬁdence b eliefs. T unability . Both thresholds are conﬁgurable via the API, enabling op erators to adapt consoli- dation b eha vior to their domain’s risk proﬁle: • The circuit break er accepts a max_deprecation_ratio parameter (default 0.5, v alid range 0.1–0.9). R e c ommende d settings : high-stak es domains (medical, legal, ﬁnancial) should use 0.2–0.3; general-purp ose agents use the default 0.5; batc h clean up op erations on kno wn-stale graphs can use 0.7–0.9, preferably after a dry-run v alidation pass. • Published-item protection can b e relaxed via an explicit allow_published_deprecation=true ﬂag for op erators who require full automated con trol ov er all items, including curated ones. This ﬂag is logged in the audit report for traceabilit y . • The dry-run mo de is itself a tunabilit y mech- anism: operators can preview all proposed ac- tions before committing, adjusting the assess- men t prompt or thresholds based on the dry-run rep ort. Principle 10 (Conserv ativ e Memory Manage- men t) . When an AI system makes automate d de- cisions ab out memory r etention, the default must b e pr eservation. Deleting a useful memory is worse than r etaining a useless one. Cir cuit br e akers, dry runs, and pr ote ction tags ensur e that c onsolidation enhanc es quality without risking c atastr ophic loss. 9.6 Consolidation as LLM-Decoupled Op eration The Dream State accepts an y LLM through a plug- gable adapter in terface. The consolidation mo del can diﬀer from the agent’s primary mo del—for in- stance, using a smaller, cost-eﬃcien t mo del for nigh tly batch pro cessing while the agen t uses a larger mo del for in teractive reasoning. The mem- ory graph is the shared substrate; the LLM is a to ol applied to it. 10 Priv acy Arc hitecture 10.1 Lo cal-First, Summary-to-Cloud The priv acy architecture enforces a strict bound- ary: raw con ten t (chat transcripts, voice record- ings, images, to ol output, user PI I) remains local to the agen t runtime. Only PI I-redacted summaries, extracted facts, topic keyw ords, artifact p ointers (paths, not conten t), and em b edding v ectors cross the priv acy b oundary into the cloud graph. PI I redaction is applied during the ingest pipeline b efore any data reaches the graph database. The redaction step uses an LLM (through the same pluggable adapter in terface) to identify and remov e p ersonally identiﬁable information from summaries. 10.2 BYO-Storage Arc hitecture As describ ed in Section 6.4.2 , artifacts in the graph are p ointers to ﬁles on the user’s o wn storage—lo cal 34 ﬁlesystems, net work shares, or cloud storage buc k- ets. The system nev er copies, cac hes, or pro xies artifact con tent. This means: • Users control data residency and retention. • The graph database con tains no ra w user conten t. • Data exﬁltration from the graph yields only metadata. • Compliance with data so vereign t y regulations is simpliﬁed. This design reﬂects the Metadata Over Con tent principle (T able 10 ): the cloud graph stores the minim um information necessary for recall and rea- soning, while ra w con tent sta ys lo cal. This pre- serv es user priv acy , reduces storage costs, eliminates data exﬁltration risk, and—equally imp ortant— enables c o gnitive eﬃciency : agents read compact summaries to understand con text, dereferencing ra w conten t only when exact detail is required. 10.3 Multi-Channel Session Identit y F or agents op erating across multiple platforms (messaging services, web in terfaces, desktop ap- plications), session identit y is user-cen tric, not c hannel-centric. The session ID format enco des con text (e.g., personal , work ), user iden tity hash, date, and sequence num b er, enabling cross-channel memory retriev al uniﬁed under one iden tity . The con text ﬁeld serv es as a memory namespace—not a tenan t isolation b oundary—allowing the same user to main tain separate memory contexts (p ersonal preferences vs. w ork decisions) within a single de- plo yment. Design note: Identity over channel. Session con- tin uity follo ws the user, not the platform. An agen t that forgets a conv ersation b ecause the user switc hed from mobile to desktop has failed at its core purp ose. 10.4 Threat Mo del and Mitigations The priv acy arc hitecture addresses a sp eciﬁc threat mo del. T able 7 en umerates the primary threats and the mitigations pro vided by the current design. W e ackno wledge that LLM-based PI I redaction is not a formal priv acy guarantee. F alse negatives (PI I that escapes redaction) and false p ositiv es (ov er- redaction that degrades summary quality) are in- T able 7: Threat mo del and mitigations for the memory priv acy b oundary . Threat Mitigation PI I leak age via summaries LLM-based redaction b efore graph ingest. Limitation : LLM redaction has non-zero false negativ e rates; no formal guar- an tees. Mem b ership infer- ence via em b ed- dings Em b eddings are stored p er- tenan t in isolated database par- titions. Limitation : embedding in version attac ks are an activ e researc h area. Prompt injection in Dream State Consolidation operates on stored metadata, not ra w user input; safet y guards (circuit break er, published protection) limit blast radius. Metadata re- iden tiﬁcation T opics and k eywords can b e iden tifying ev en without PI I. Limitation : curren t redaction targets named en tities, not top- ical ﬁngerprin ts. Malicious artifacts via to ol output Artifact p ointers are stored, not artifact conten t; the graph nev er executes or parses artifact data. Creden tial leak age The ingest pip eline rejects kno wn creden tial patterns (API k eys, tok ens) b efore summa- rization. heren t to the approach. F or regulated data classes (HIP AA, GDPR Article 9), op erators should de- plo y a dedicated redaction mo del with measured precision/recall or a rule-based pre-ﬁlter upstream of the LLM redactor. The current architecture sup- p orts this through the pluggable adapter in terface: the redaction component can b e replaced without c hanging the ingest pip eline. 11 MCP In tegration 11.1 The Mo del Con text Proto col The Mo del Con text Proto col (MCP) MCP [ 2025 ] pro vides a standardized in terface betw een AI agents 35 and external to ols. By exp osing the memory sys- tem as MCP to ols, an y MCP-compatible agen t gains memory capabilities without custom in tegra- tion co de. MCP-based memory systems are no w table stak es for adoption—the oﬃcial MCP Mem- ory Server, Mem0 MCP Server, Redis Agent Mem- ory Serv er, Neo4j MCP Servers, MemoryOS-MCP , and Basic Memory all provide MCP interfaces. Our MCP in tegration is not a diﬀeren tiator but a nec- essary condition for platform-agnostic deplo yment. What distinguishes the in terface is the br e adth of graph op erations exp osed: not just store/re- triev e, but reasoning and prov enance to ols ( Anal yzeImp act , FindP a th , GetPro ve- nance ) and temporal p oint-in-time queries that other MCP memory serv ers do not provide. 11.2 T o ol T axonom y The MCP serv er exp oses 51 to ols organized into six categories: Cognitiv e Memory Lifecy- cle ( memory_ingest , memory_recall , memory_consolidate , memory_discover_edges , memory_store_execution , memory_dream_state , memory_add_response ): The complete memory pip eline. memory_ingest buﬀers user messages in w orking memory while simultaneously recalling relev an t long-term memories and creating atomic memory units (see below), pro viding the agent with prior con text on every turn. memory_add_response captures assistant resp onses into the working mem- ory buﬀer. memory_consolidate summarizes sessions with PII redaction and consolidation en- ric hments (prosp ective indexing, even t extraction). memory_recall p erforms seman tic search ov er long-term memory with optional graph-augmented m ulti-query reformulation and edge trav ersal. memory_discover_edges generates implication queries from a newly stored memory and creates t yp ed edges to related existing memories—the mec hanism that k eeps the graph connected. memory_store_execution p ersists to ol execution results as memory artifacts. memory_dream_state triggers the Dream State consolidation pip eline (Section 9 ). W orking Memory ( chat_add , chat_get , chat_clear ): Redis-bac ked session buﬀer for con- v ersational context b efore consolidation to long- term graph storage. Graph Na vigation ( get_project , get_spaces , get_item , get_revision , get_artifacts , search_items ): Structured exploration of the pro ject/space/item/revision hierarc hy . Reasoning & Pro venance ( get_edges , get_dependencies , get_dependents , analyze_impact , find_path , get_provenance_summary ): Understanding why memories exist and ho w they relate, enabling c hain-of-evidence reasoning. T emp oral Op erations ( get_item_revisions , get_revision_by_tag , get_revision_as_of , resolve_kref ): Na vigating memory through time, including p oin t-in-time queries (“what did the agen t b elieve ab out X on date Y?”). Graph Mutation ( create_item , create_revision , tag_revision , create_edge , set_metadata , deprecate_item , and 13 additional CR UD op erations): Programmatic graph structure managemen t for agen t-driven and multi-agen t w orkﬂows. 11.3 A tomic Memory W rites A single memory_ingest in v o cation creates the complete memory unit: space (with auto-creation), item, revision with metadata, artifact attachmen t, edges to source materials, bundle mem b ership, tag assignmen t, and async hronous embedding genera- tion. This “one to ol call, complete memory” design eliminates fragile m ulti-step sequences that could lea ve partially-committed state in the graph. Design note: One to ol c al l, c omplete memory. A single MCP tool in v o cation m ust create the full memory unit. Requiring m ultiple sequen tial tool calls creates fragile, partially-committed states and increases agen t complexity . 11.4 Human Auditabilit y MCP mak es memory accessible to agen ts, but trust- w orthy systems also require a h uman-auditable sur- face. A web dashboard and companion desktop asset bro wser render the cognitive memory graph 36 using the same pro ject/space/item/revision hierar- c hy , enabling op erators to insp ect what an agent remem b ered, why , and when through imm utable revision history , lineage trav ersal, and artifact in- sp ection. Every consolidation decision is traceable to a Dream State rep ort; every memory has a prov e- nance c hain. F or agen ts p erforming consequen tial w ork, this auditabilit y is not a con venience but a require- men t. The same pro ject/space/item/revision hier- arc hy serves b oth asset bro wsing and memory in- sp ection, ensuring that the cognitive state of an agen t is as navigable and auditable as a traditional asset managemen t system. 12 Comparativ e Analysis T able 8 compares the architecture across nine ev al- uation dimensions. T able 9 provides a feature-level comparison with concurren t systems. 12.1 vs. Flat Retriev al Systems Flat retriev al systems treat eac h query as an in- dep enden t similarity searc h ov er static do cumen t c hunks. The prop osed architecture extends this mo del along three axes: (i) statefulness —memories ev olve through revisions rather than b eing o ver- written or duplicated; (ii) structur e —t yp ed edges enco de causal and eviden tial relationships b etw een memories; (iii) c onsolidation —the Dream State activ ely distills episo dic exp erience into seman tic kno wledge. Flat retriev al remains v aluable as one comp o- nen t of a h ybrid system. The architecture incor- p orates vector similarity as one of tw o retriev al branc hes (alongside fulltext search), recognizing that embedding-based searc h is eﬀective for seman- tic matc hing but insuﬃcien t as a complete memory system. 12.2 vs. Tiered Buﬀer Systems Tiered buﬀer systems introduce an imp ortant op erating-system metaphor for memory manage- men t. Ho wev er, they t ypically op erate on ﬂat stores without t yp ed relationships or imm utable v ersion- ing. More critically , the memory managemen t logic is t ypically em b edded within the LLM’s own rea- soning pro cess: the model decides when to mov e con tent b et ween tiers, creating tight LLM-memory coupling. The prop osed arc hitecture provides richer struc- ture (graph edges, revision history), stronger safety guaran tees (circuit breakers, dry runs, published protection), and LLM-decoupled memory manage- men t (the memory graph is an external data store, not mo del-in ternal state). 12.3 vs. Extended Context Windo ws As argued in Section 3 , con text window extension addresses the wrong problem. It increases atten- tion capacity but do es not provide persistent recall, structural representation, or mo del-indep endent storage. The quadratic scaling of attention cost mak es it economically infeasible for lifelong agen t memory . Extended context is complemen tary— useful for in-session reasoning o ver recently re- triev ed memories—but not a substitute for p ersis- ten t, structured, externalized memory . 12.4 vs. Static Knowledge Graphs Static kno wledge graphs excel at representing shared, encyclopedic kno wledge with ﬁxed sc hemas. The prop osed architecture is designed for exp erien- tial memory—agen t-scop ed, temporally-evolving, with ﬂexible metadata and a working memory lay er. The t wo approaches are complemen tary: a memory- equipp ed agent could reference external knowledge graphs via referenced edges while maintaining its o wn exp eriential memory in the graph. 12.5 vs. MA GMA MA GMA [ 2026 ] represents the most direct struc- tural alternativ e, implementing a m ulti-graph ar- c hitecture with four orthogonal graph lay ers (se- man tic, temp oral, causal, entit y) and policy- guided retriev al trav ersal. The tw o arc hitectures represen t alternative philosophical commitments. MA GMA disentangles memory dimensions into separate graphs, enabling cleaner retriev al rout- ing: a temp oral query trav erses only the tem- p oral graph, a causal query trav erses only the 37 T able 8: Comparative analysis across nine ev aluation dimensions. Dimension Flat RAG Tiered Buﬀers Extended Context Static KG Kumiho Retriev al Em b edding only Embedding + scan In-context Query language Hybrid + graph nav Statefulness Stateless Tiered buﬀers Ephemeral Curren t state V ersioned history Relationships None None None Fixed ontology 6 t yp ed edge t yp es Pro venance None None None Sc hema-dep enden t Complete lineage T emp oral Na v. Poin t-in-time Current windo w Current windo w Mostly current F ull history + tags Consolidation None Manual / rule N/A Batc h ETL LLM Dream State LLM Coupling Lo w High Complete None None (MCP) Cost Scaling Linear Linear Quadratic Linear Linear W ork Auditabilit y None None None P artial F ull (SDK + desktop) T able 9: F eature comparison with concurren t agent memory systems (as of F eb 2026). ✓ = present, ( ✓ ) = partial, × = absen t. F eature Graphiti Mem0g A-MEM Letta MA GMA Hindsight MemOS Kumiho Property graph storage ✓ ✓ ( ✓ ) × ✓ × × ✓ Hybrid retriev al ( ≥ 2 modalities) ✓ ( ✓ ) × × ( ✓ ) ( ✓ ) × ✓ Immutable revision history ( ✓ ) ( ✓ ) × ( ✓ ) × × × ✓ F ormal b elief revision × × × × × × × ✓ URI-based addressing × × × × × × × ✓ Typed edge ontology ( ≥ 6) ( ✓ ) × ( ✓ ) × ( ✓ ) × × ✓ Async. consolidation + safety guards × × × ( ✓ ) × × × ✓ LLM-decoupled ( ✓ ) ( ✓ ) × ( ✓ ) ( ✓ ) ( ✓ ) ( ✓ ) ✓ Uniﬁed asset + memory graph × × × × × × × ✓ Agent w ork auditability (SDK) × × × × × × × ✓ Benchmark ev al. (LoCoMo/LME) ✓ ✓ ✓ ✓ ✓ ✓ × ✓ † BYO-storage (raw data stays local) × × × × × × × ✓ † LoCoMo: 0.447 four-category F1 / 97.5% adv ersarial refusal (Section 15.2 ). LoCoMo-Plus: 93.3% (Section 15.3 ). LongMemEv al planned. causal graph. Our arc hitecture uniﬁes all rela- tionships in a single property graph with t yp ed edges, enabling cross-dimensional trav ersal: An- al yzeImp act propagates across Depends_On , Derived_Fr om , and Supersedes edges sim ul- taneously , discov ering dep endencies that span mul- tiple memory dimensions. The uniﬁed graph design reﬂects a delib erate architectural trade-oﬀ. Multi- graph separation oﬀers cleaner retriev al routing and a voids cross-dimensional noise, but in tro duces syn- c hronization complexity: updates that span multi- ple dimensions (e.g., a b elief revision that is simul- taneously temp oral, causal, and seman tic) must b e co ordinated across separate graph instances, and cross-dimensional queries require joins across stor- age b oundaries. The uniﬁed prop erty graph ac- cepts edge-type heterogeneity in exc hange for trans- actional atomicity—a single Neo4j transaction can create a revision, re-p oint tags, add Supersedes and Derived_From edges, and up date prov e- nance metadata, ensuring that the b elief state is nev er in a partially-up dated condition. This atom- icit y is what enables the AGM compliance re- sults (Section 15.7 ): the formal postulates require that revision is an atomic operation, and the uni- ﬁed graph makes this a database-lev el guarantee rather than an application-level co ordination prob- lem. Neither approach has b een empirically com- pared to the other. MA GMA’s 0.70 on LoCoMo (LLM-as-judge score, not token-lev el F1) and 61.2% on LongMemEv al (with 95% tok en reduction) es- tablish strong baselines; whether our uniﬁed graph approac h achiev es comparable or b etter results un- der the same ev aluation conditions is the most im- p ortan t op en empirical question for this work. 12.6 vs. Hindsigh t Hindsigh t [ 2025 ] achiev es the highest rep orted Lo- CoMo (89.61%, p ercen tage accuracy) and Long- MemEv al (91.4%) scores, demonstrating that prag- 38 T able 10: Core design principles. # Principle Category 1 Structural Reuse Structure 2 Universal Addressabilit y Structure 3 Immutable Rev., Mutable Ptr. F ormal 4 Explicit Over Inferred Rel. F ormal 5 Non-Blo cking Enhancement P erformance 6 Conserv ative Memory Mgmt. Safet y 7 Metadata Over Conten t Safet y matic b elief trac king without formal guaran tees can deliv er strong empirical results. Its Opinion Net work main tains conﬁdence-scored b eliefs that up date with evidence—functionally similar to our revision mec hanism, but without A GM ground- ing. The relationship is complementary rather than comp etitiv e: Hindsigh t demonstrates the empiri- cal v alue of structured b elief tracking; our formal framew ork pro vides the theoretical guaran tees (Rel- ev ance, Core-Retainment, Consistency) that such systems could adopt to ensure b elief revision sat- isﬁes minimal c hange and do es not discard b eliefs without justiﬁcation. Hindsigh t explicitly iden ti- ﬁes safe p ersonality management as an op en prob- lem; our safety-hardened consolidation with circuit break ers and published-item protection addresses precisely this gap. 13 Core Design Principles T able 10 consolidates the sev en core design prin- ciples gov erning the architecture. W e distinguish these researc h-level principles—which deﬁne the ar- c hitecture’s formal and structural commitmen ts— from additional engineering design notes (b oundary v alidation, latency matching, infrastructure inde- p endence, query sanitization, session iden tity , and atomic writes) documented inline in the relev ant sections. 14 Reference Implemen tation: Kumiho The architecture is fully implemented as the Ku- miho system 2 , which provides op en-source client SDKs (Python, C++, Dart, MCP server) with the core graph serv er delivered as a managed cloud ser- vice, and v alidated through deploymen t: Kumiho Server. Rust-based gRPC serv er han- dling graph op erations, dynamic connection rout- ing, embedding generation, and hybrid search using tokio , tonic , and neo4rs . Kumiho SDK. Python SDK pro viding typed access to all graph operations with memory refer- ence v alidation and retry logic. Kumiho MCP Serv er. Python MCP serv er wrapping the SDK, exp osing all memory operations as MCP to ols. Kumiho Memory Library . Python library pro viding higher-level cognitive memory op erations: session management, memory ingest with auto- matic recall, PI I-redacted summarization with con- solidation enrichmen ts (prosp ectiv e indexing, even t extraction), and the Dream State pip eline. Kumiho Dash b oard. W eb-based interface at https://kumiho.io (Figure 1 ) pro viding tw o inte- grated views: (i) an AI Co gnitive Memory bro wser with fulltext search, in teractiv e force-directed graph visualization of typed edges, node detail pan- els (title, summary , kref URI, revision tags, cre- ation date), and live stats (total memories, spaces, connections, Dream State recency); and (ii) an Asset Br owser for navigating non-memory graph items (models, textures, workﬂo ws). Both views share the same pro ject/space/item/revision hierar- c hy and the same graph API, unifying work pro d- uct management and memory insp ection in a single w eb interface. No de types are color-co ded b y kind (con versation, decision, fact, reﬂection, error, ac- tion, instruction, bundle) and edges are color-co ded b y t yp e (Depends On, Deriv ed F rom, Referenced, Con tains, Created F rom, Belongs T o, Sup ersedes). Kumiho Desktop. Cross-platform desktop as- set bro wser (Flutter/Dart) pro viding oﬄine-capable access to the same graph hierarch y , optimized for bro wsing large asset collections on local and NAS 2 https://kumiho.io , https://github.com/KumihoIO 39 Figure 1: Kumiho dashboard showing the AI Cogni- tiv e Memory bro wser. The selected no de ( kumiho.io dashb o ar d Memory p age r e design with gr aph visual- ization ) displa ys its summary , kref URI, revision tag ( r=1 , latest ), and creation date. The force- directed graph on the righ t visualizes typed edges to related memories. Stats cards sho w total mem- ories (19), spaces (9), connections for the selected memory (3), and Dream State recency (4 da ys ago). The same graph, SDK, and API that agents use to manage work products is exposed here for h uman insp ection. storage. Measured end-to-end latencies: 15 ms typical for w orking memory , 80–120 ms for long-term graph queries including h ybrid search. 15 Implemen tation V alidation: A Case Study Scop e. This section presen ts three categories of empirical evidence: (i) standardized b enchmark ev aluations on LoCoMo and LoCoMo-Plus with cross-system comparison, (ii) formal AGM compli- ance veriﬁcation, and (iii) operational case studies from deplo yment. The b enchmark ev aluations are the primary empirical contributions; the case stud- T able 11: T oken compression ratios. T yp e Summary Ra w Ratio Simple fact ∼ 12 500 42 × Preference ∼ 18 800 44 × Proﬁle ∼ 80 4K 50 × P ap er session ∼ 65 12K 185 × Planning ∼ 90 25K 278 × ies demonstrate that the architecture functions as designed in pro duction use. 15.1 T ok en Compression A central claim of the architecture is that storing compact metadata summaries rather than raw tran- scripts yields signiﬁcant tok en sa vings. T able 11 sho ws represen tative examples from the deplo y ed graph, comparing the stored summary tok en coun t against the estimated ra w conv ersation token coun t. The compression ratio ranges from ∼ 40 × for sim- ple facts to ∼ 280 × for complex multi -turn sessions. Critically , this compression comp ounds at retriev al time: a typical recall returning k =5 results injects ∼ 250–400 summary tok ens in to the agent’s context, compared to the ∼ 50,000+ tok ens that w ould b e re- quired to repla y the ra w transcripts of those same sessions. This corresp onds to the cost scaling ad- v an tage formalized in Section 3 : the retriev al ap- proac h scales as O ( k · ¯ s ) where ¯ s is the mean sum- mary size, while raw repla y w ould scale as O ( k · ¯ c ) where ¯ c is the mean con v ersation size. 15.2 LoCoMo Benc hmark Ev aluation W e ev aluated Kumiho on the LoCoMo b ench- mark [ 2024 ], a multi-session conv ersation b ench- mark comprising 1,986 questions across 10 conv er- sations and ﬁve categories that test long-term mem- ory retriev al across extended conv ersational histo- ries. W e rep ort results using the oﬃcial token- level F1 with Porter stemming metric—the stan- dard scoring function used by the broader comm u- nit y (Zep [ 2025 ], Mem0 [ 2025 ], Memobase [ 2025 ]). The ev aluation uses summarized recall mo de with GPT-4o as the answer mo del, graph-augmen ted re- triev al with m ulti-query reformulation, and a recall limit of 3 memories p er query with con text top- k =7 . 40 T able 12: LoCoMo token-lev el F1: cross-system comparison (four retriev al categories). System Single Multi T emp. Open Overall Source Zep 0.357 0.194 0.420 0.496 — Zep [ 2025 ] OpenAI Mem — — — — ∼ 0.34 Zep [ 2025 ] Mem0 0.387 0.286 0.489 0.477 ∼ 0.40 Zep [ 2025 ] Mem0-Graph 0.381 0.243 0.516 0.493 ∼ 0.40 Zep [ 2025 ] Memobase 0.463 0.229 0.642 0.516 — Memobase [ 2025 ] ENGRAM 0.231 0.183 0.219 0.086 0.211 Patel and Patel [ 2025 ] Kumiho 0.462 0.355 0.533 0.290 ⋆ 0.447 ‡ This work ⋆ Open-domain questions require world knowledge absent from con versation history; a memory-only system’s ex- pected ﬂo or (see text). ‡ F our-category weighted av erage ( n =1 , 540 ). Including adversarial refusal accuracy (97.5%, n =446 , binary scor- ing), overall F1 is 0.565 ( n =1 , 986 ). T able 13: LoCoMo p er-category breakdown ( n =1 , 986 ). Category n F1 Single-hop 841 0.462 Multi-hop 282 0.355 T emp oral 321 0.533 Op en-domain 96 0.290 Retriev al av erage 1,540 0.447 A dversarial (refusal acc.) † 446 0.975 Ov erall (incl. adversarial) 1,986 0.565 † Binary refusal detection, not contin uous tok en-level F1. See Cav eats. T able 12 rep orts the cross-system comparison on the four retriev al categories (single-hop, multi- hop, temp oral, op en-domain). Kumiho achiev es 0.447 four-category F1 ( n =1 , 540 )—the high- est reported score on the oﬃcial LoCoMo tok en- lev el metric LoCoMo [ 2024 ], Patel and P atel [ 2025 ], Zep [ 2025 ]. A dversarial refusal accuracy (97.5%, n =446 ) is rep orted separately in T able 13 , as it uses binary refusal detection rather than con tinu- ous tok en-level F1. Including adv ersarial, o v erall F1 is 0.565 ( n =1 , 986 ). T able 13 provides the per- category breakdo wn. Key ﬁndings. F our observ ations emerge from the tok en-level F1 ev aluation: (i) Ar chite ctur e gener alization. The same graph-nativ e arc hitecture that dominates LoCoMo- Plus (93.3% on Lev el-2 cognitive constrain ts, Section 15.3 ) also leads on standard factual QA with 0.447 four-category F1. The result is not attributable to a diﬀeren t pipeline conﬁguration— the same prosp ective indexing, even t extraction, and graph-augmen ted recall that bridge cue-trigger seman tic disconnects also impro ve factual retriev al accuracy . (ii) A dversarial r efusal ac cur acy. The system correctly refuses to fabricate information in 97.5% of adv ersarial questions ( n =446 ). This is ar- guably the most imp ortan t metric for pro duction trust: a memory system that hallucinates plausible- sounding but incorrect answ ers is more dangerous than one that fails to retriev e. The near-perfect score is a natural consequence of the belief revi- sion arc hitecture (Section 7 ): the memory graph gen uinely do es not con tain fabricated information— imm utable revisions preserv e only what was actu- ally discussed, and consolidation-as-denoising strips the surface-level cues that adversarial questions exploit—so there is nothing for the mo del to hallu- cinate from. Note that adv ersarial scoring uses bi- nary refusal detection (presence of refusal phrases), not con tinuous token-lev el F1. (iii) Multi-hop as str ongest r elative impr ovement. Multi-hop F1 of 0.355 exceeds Mem0 (0.286, +6.9 pp) and Mem0-Graph (0.243, +11.2 pp). Multi-hop questions require connecting information across multiple con versation segments—precisely the scenario where graph-augmented recall with edge trav ersal pro vides a structural adv antage o ver ﬂat v ector stores. The typed Referenced and Derived_Fr om edges created during ingestion enable the retriev al pip eline to discov er memories that are structurally connected but semantically distan t in embedding space. (iv) Op en-domain as exp e cte d ﬂo or. Op en- domain questions (0.290) require world knowledge b ey ond the con versation history—e.g., general facts ab out geography , science, or culture that the partic- ipan ts did not discuss. A memory-only system can- not provide this kno wledge, making op en-domain the exp ected accuracy ﬂo or rather than a system failure. Notably , Zep (0.496) and Mem0 (0.477) score higher on open-domain, likely b ecause their retriev al pip elines surface broader context that in- ciden tally ov erlaps with world knowledge. Cross-b enc hmark architecture consistency . T able 14 demonstrates that the architectural inno v ations generalize across b oth ev aluation proto cols—they are not tuned to a single b ench- mark. 41 T able 14: Architectu re consistency across b enc h- marks. Innov ation LoCoMo-Plus LoCoMo F1 Prospective idx Bridges cue-trigger gap Improves temporal/multi-hop Even t extraction Preserves causal chains Preserves factual detail LLM reranking 100% goal retriev al hit Better sibling selection Graph-aug. recall Connected memory discov ery Multi-hop 0.355 (best) Cost. The full LoCoMo token-lev el F1 ev alua- tion (1,986 questions across 10 conv ersations) costs ∼ $10 with GPT-4o-mini as the answ er mo del and ∼ $14 with GPT-4o. The pipeline cost structure mirrors LoCoMo-Plus: consolidation and enrich- men t are one-time per session, while p er-query costs are dominated b y answer generation. Ca veats. Comp etitor F1 scores are sourced from published ev aluations Zep [ 2025 ], Memobase [ 2025 ], Patel and Patel [ 2025 ], not from con trolled re-ev aluation on identical infrastructure; diﬀer- en t ev aluation conﬁgurations (T op-K, prompt tem- plates, em b edding mo dels) ma y account for some cross-system v ariance. The adversarial category uses binary refusal detection (match against re- fusal phrases), not con tin uous tok en-level F1—the ground truth is not used for scoring. The 97.5% adv ersarial score is therefore a refusal accuracy metric, metho dologically distinct from the four re- triev al categories. W e rep ort the four-category F1 (0.447) as the primary comparable num b er and the o verall F1 (0.565, including adv ersarial) separately . Op en-domain p erformance (0.290) is b elow sev eral comp etitors—this reﬂects the arc hitectural decision to retriev e only from the memory graph rather than augmen ting with external knowledge sources. 15.3 LoCoMo-Plus Benc hmark Ev alua- tion Where LoCoMo (Section 15.2 ) ev aluates factual re- call under strong semantic alignment betw een query and stored con tent, LoCoMo-Plus [ 2026 ] tests a fun- damen tally harder capability: implicit constraint recall under inten tional cue-trigger semantic discon- nect. In LoCoMo-Plus, a cue dialogue em b eds a constrain t (e.g., “Joanna decided to quit sugar after feeling sluggish”) that a later trigger query m ust connect to (e.g., “I’ve been indulging in all kinds of new desserts lately”), with no surface-lev el lex- ical o verlap b etw een them. The benchmark com- prises 401 en tries stitc hed in to 10 LoCoMo base con versations (19–32 sessions each, 369–689 turns p er conv ersation), spanning four constrain t types— causal ( n =101 ), state ( n =100 ), goal ( n =100 ), v alue ( n =100 )—with time gaps of 2 w eeks to 12+ mon ths b et ween cue and trigger. W e rep ort results on all 401 en tries across all four constraint types. System conﬁguration. The ev aluation uses the same graph-nativ e architecture ev aluated on LoCoMo, op erating in summarized recall mo de (title + summary metadata). F our architectural mec hanisms address the semantic disconnect that mak es LoCoMo-Plus qualitatively harder than Lo- CoMo: Pr osp e ctive indexing. During session consolida- tion, the summarizer generates 3–5 hypothetical fu- ture scenarios in which the memory w ould b e rele- v an t, using vocabulary and framing diﬀeren t from the original conv ersation. These implications are indexed alongside the summary for b oth fulltext and vector retriev al. The tec hnique bridges the cue- trigger semantic gap at write time : when a con ver- sation mentions lactose in tolerance making some- one b edridden, one generated implication migh t b e “Mon ths later, Caroline carefully reads restaurant men us before agreeing to dinner dates, prioritiz- ing her dietary restrictions o ver so cial con venience.” A t query time, the trigger—“I ﬁnally said yes to a dinner date, but I pic ked the place solely b ecause I kno w I won’t end up doubled o ver afterward”— ﬁnds the memory through the implication’s alter- nativ e framing, despite sharing no vocabulary with the original conv ersation. This is analogous to how h uman memory w orks: w e encode not just what happ ened, but what it me ans for the futur e . The enco ding shap es retriev al. Implications are gen- erated by a light mo del (GPT-4o-mini) running in parallel with the full-mo del summarization via asyncio.gather , adding zero w all-clo ck time. Event extr action. The consolidation pip eline ex- tracts structured even ts from each session, each comprising a description and its consequences (e.g., “Joanna decided to quit sugar → Improv ed energy lev els”). These ev ents are app ended to the summary text b efore indexing. Where narrativ e summaries compress episodes into abstract descriptions (“dis- cussed dietary changes”), ev ent extraction preserv es 42 the sp eciﬁc inciden ts and their causal chains that LoCoMo-Plus questions target. Even ts and impli- cations are complemen tary: ev ents preserv e what happ ene d with causal structure; implications antic- ipate when it wil l matter with alternativ e v o cabu- lary . Sibling r elevanc e ﬁltering. After retriev al, sib- ling revisions (other memories from the same item) are ﬁltered b y embedding cosine similarit y to the trigger query using text-embedding-3-small with a threshold of 0.30. This preven ts con text dilution where lo osely-related siblings o verwhelm the answer mo del. In one case, an entry initially presen ted 25 sibling revisions to the answer model; after ﬁl- tering, only 2 memories / 4 revisions reac hed the answ er mo del, pro ducing a correct answer. The sys- tem p erforms quality control o ver what reac hes the consumer—managing retrieved context, not just re- trieving ev erything lo osely related. Client-side LLM r er anking. After cosine pre- ﬁltering, remaining sibling revisions are ev aluated b y the consuming agen t’s o wn LLM using struc- tured metadata (title, summary , facts, entities, ev ents, implications). In agen tic contexts (MCP), the host agen t p erforms reranking as part of its nor- mal resp onse generation at zero additional inference cost. In non-agen tic contexts (API, pla yground), a dedicated light weigh t mo del (e.g., GPT-4o-mini) handles it using the user’s own API key . Three con- ﬁguration mo des are supp orted: client (host agent LLM, zero cost), de dic ate d (user-conﬁgured mo del), and auto (detect con text). This design reﬂects the LLM-decoupled arc hitecture (Section 8.8 ): the memory la yer pro vides structured data; the con- sumer’s own in telligence p erforms selection. As agen t models improv e, reranking qualit y improv es automatically without an y system changes. The pip eline uses GPT-4o-mini for summariza- tion, query reformulation, edge disco very , even t ex- traction, implication generation, and judging, with GPT-4o for answ er generation only . Each en- try is ingested end-to-end: parallel session consol- idation into long-term graph storage, follow ed by LLM-driv en edge disco very that links eac h mem- ory to related existing memories via typed edges ( Referenced ), bridging the cue-trigger semantic disconnect structurally . Recall uses multi-query re- form ulation (3–4 semantic v ariants per trigger) with T able 15: LoCoMo-Plus: Kumiho vs. published baselines ( n =401 ). System Mo del Acc. (%) RAG (text-ada-002) text-ada-002 23.5 RAG (text-embed-small) text-em bed-small 24.9 RAG (text-embed-large) text-embed-large 29.8 Mem0 V arious 41.4 A-MEM V arious 42.4 SeCom V arious 42.6 GPT-4o (full con text) GPT-4o 41.9 GPT-4.1 (full con text) GPT-4.1 43.6 Gemini 2.5 Flash (1M) Gemini 2.5 Flash 44.6 Gemini 2.5 Pro (1M) Gemini 2.5 Pro 45.7 Kumiho (4o-mini answer) GPT-4o-mini ∼ 88 Kumiho (4o answer) GPT-4o 93.3 graph-augmen ted retriev al (edge trav ersal to sur- face structurally connected memories that v ector similarit y alone would miss). T able 15 rep orts the headline comparison. Ku- miho ac hieves 93.3% judge accuracy (374/401) on all entries—see the Indep enden t Repro duction and Cav eats notes b elow for context on indepen- den t replication and benchmark metho dology— outp erforming the b est published baseline (Gem- ini 2.5 Pro, 45.7%) b y 47.6 p ercentage p oints. Re- call accuracy—the fraction of en tries where the sys- tem retriev es at least one relev an t memory—reaches 98.5% (395/401). The result is particularly striking b ecause the system uses GPT-4o-mini—one of the c heap est av ailable mo dels—for the bulk of LLM op- erations (summarization, query reformulation, edge disco very , even t extraction, implication generation, judging), with GPT-4o used only for answ er gener- ation. Even with GPT-4o-mini as the answer mo del ( ∼ 88%), the system more than doubles the previous state-of-the-art. Gemini 2.5 Pro with its 1M+ to- k en context window can ﬁt en tire conv ersation his- tories without any summarization or retriev al, y et ac hieves only 45.7%—demonstrating that cognitiv e memory is not a context capacity problem but an or ganization, enrichment, and r etrieval problem. P erformance by constrain t t yp e. T able 16 rep orts accuracy across all four constraint types with both answer mo dels. The LoCoMo-Plus dataset distributes entries approximately equally across t yp es (101 causal, 100 each for state, v alue, goal). Causal, state, and v alue t yp es all 43 T able 16: LoCoMo-Plus accuracy by constrain t t yp e ( n =401 ). T yp e n Correct 4o (%) 4o-mini (%) Causal 101 97 96.0 96.0 State 100 96 96.0 95.0 V alue 100 96 96.0 ∼ 89 Goal 100 85 85.0 ∼ 73 Ov erall 401 374 93.3 ∼ 88 ac hieve 96% with GPT-4o—near-ceiling p erfor- mance demonstrating that the architecture’s ev en t extraction and prosp ectiv e indexing pro duce sum- maries ric h enough for reliable retriev al and rea- soning across all three types. Goal-type questions (85%) remain the hardest: they require the most abstract reasoning to connect a current trigger to a stored in ten tion (e.g., connecting “can’t b elieve they c harge that muc h for a car key” to “sa ving for an engagemen t ring” requires understanding that b oth in volv e money managemen t, despite no vocabulary o verlap). The mo del impact is type-dep endent: switching from GPT-4o-mini to GPT-4o yields causal +0%, state +1%, v alue +7%, goal +12%. The harder the constrain t type, the more a stronger answ er mo del helps. This conﬁrms that the b ottleneck for the hardest queries is answ er mo del reasoning capacity , not retriev al quality—the 98.5% recall accuracy is in v ariant across mo dels. P erformance b y time gap. T able 17 rev eals the most signiﬁcant empirical ﬁnding of this ev al- uation: the elimination of the long-horizon accu- racy cliﬀ. In a pre-enric hment run ( n =200 , with- out prospective indexing or even t extraction), ac- curacy dropped to 37.5% for time gaps exceeding 6 mon ths. With b oth enric hments active, accuracy at > 6 mon ths rises to 84.4%—a 47 percentage p oint impro vemen t that v alidates prosp ectiv e indexing as the critical mechanism for bridging long temp oral gaps. The improv ement is straigh tforward to ex- plain: as time increases, em b edding similarit y b e- t ween cue and trigger naturally decays. Prosp ec- tiv e indexing pro vides alternative retriev al paths through the generated implications, whose vocab- ulary is indep enden t of the original conv ersation’s w ording and therefore does not suﬀer the same tem- T able 17: LoCoMo-Plus accuracy by time gap ( n =401 , GPT-4o answ er). Time Gap n A cc. (%) Notes ≤ 2 weeks 35 88.6 More goal-t yp e en tries 2 wk – 1 mo 77 97.4 Peak performance 1 – 3 mo 164 93.9 Largest cohort 3 – 6 mo 93 93.5 No degradation > 6 mo 32 84.4 Cliﬀ eliminated p oral deca y . The ≤ 2 week buck et (88.6%) is slightly low er than adjacent p erio ds because it con tains pro- p ortionally more goal-type constrain ts, which are harder regardless of time gap. A ccuracy is consis- ten t across all 10 base con versations (90–97.5%), with no con versation falling b elow 90%, demon- strating robustness across diﬀeren t dialogue struc- tures, conv ersation lengths, and topic distributions. F ailure mo de taxonomy . Analysis of the 27 failures (6.7%) across 401 entries reveals t wo dis- tinct failure mo des: 1. Recall miss (6 failures, 22%): No relev an t memories w ere retrieved b y any query v ariant, or the retriev ed con text was empty . These rep- resen t the system’s hard ﬂo or—cases where even prosp ectiv e indexing and graph-augmen ted re- call could not bridge the seman tic gap. Notably , time gap is not the primary predictor: 2 of 6 recall misses o ccur at ≤ 1 w eek. 2. Answ er fabrication (21 failures, 78%): The correct memory app eared in the recalled con- text, but the answer mo del generated a re- sp onse that ignored it or fabricated around it. The dominan t pattern is surfac e-theme hi- jacking : the mo del follows the trigger’s sur- face theme and tone rather than connecting to the contradictory or abstract recalled memory . Example: the trigger mentions “indulging in new desserts” while the recalled context states “Joanna decided to quit sugar after exp eriencing constan t sluggishness”—the mo del ac knowledged the sugar quit but fabricated details (“dairy-free ice cream recip e”) matching the trigger’s p ositive tone rather than surfacing the contradiction. All 15 goal failures are answer fabrication—the sys- tem retrieved the right memory every time, but the mo del could not bridge the abstract gap b e- 44 t ween trigger and stored inten tion. This failure distribution is architecturally signif- ican t: 98.5% recall accuracy means the memory la yer has eﬀectively solv ed the retriev al problem for LoCoMo-Plus. The remaining 6.7% end-to-end gap is en tirely attributable to the answ er mo del’s rea- soning limitations, not to the memory architecture. Mo del-decoupled arc hitecture. The sys- tem’s mo del-decoupled design (Section 5 ) allows sw apping the answ er model without any pip eline c hanges. T able 16 rep orts b oth GPT-4o and GPT- 4o-mini results. The 98.5% recall accuracy is con- stan t across models—b oth receive iden tical recalled con text from the same retriev al pip eline. Only the end-to-end accuracy diﬀers (93.3% vs. ∼ 88%), concen trated in the hardest constrain t t yp es (goal +12%, v alue +7%). This empirically v alidates that the memory lay er is infrastructure that outlives an y single mo del generation: as LLMs impro ve at rea- soning o ver retrieved context, the system’s end-to- end accuracy improv es automatically . The theoret- ical ceiling with p erfect answ er generation is ∼ 99%, limited only b y the 6 genuine recall misses. Cost analysis. The total cost for 401 en tries is ∼ $14, with summarization ( ∼ $3), edge disco v- ery ( ∼ $2), implication generation ( ∼ $1), even t ex- traction ( ∼ $1), answ er generation ( ∼ $3), judging ( ∼ $0.50), and sibling embedding ( ∼ $0.10) as the primary comp onen ts. This cost eﬃciency stems from the same architectural principle v alidated b y the LoCoMo summary-only results: structured summarization with enrichmen t (cheap, one-time p er session) replaces brute-force full-context re- triev al (exp ensive, p er-query). F or comparison, running Gemini 2.5 Pro with full context on 401 en- tries would cost signiﬁcan tly more while ac hieving only 45.7% accuracy . Arc hitectural signiﬁcance. The LoCoMo- Plus results v alidate the cen tral thesis of this pa- p er more decisiv ely than LoCoMo. LoCoMo’s fac- tual recall questions ha ve strong semantic align- men t b etw een query and stored conten t—precisely the scenario where em b edding-based retriev al ex- cels. LoCoMo-Plus delib erately breaks this align- men t, testing whether the system can bridge se- man tic gaps through structural reasoning. Even taking into account the indep enden t repro duction result in the mid-80% range (see note below), the margin ov er all published baselines—the b est of whic h stands at 45.7%—demonstrates that graph- nativ e primitiv es—typed edges bridging cue-trigger disconnects, structured summarization preserving causal relationships, prosp ective indexing an ticipat- ing future retriev al needs, sibling relev ance ﬁltering con trolling con text qualit y , and m ulti-query refor- m ulation expanding the retriev al surface—pro vide capabilities that neither larger con text windows nor stronger mo dels can substitute for. The enric hment contribution is quan tiﬁed by comparing against a pre-enrichmen t baseline run ( n =200 , without prosp ective indexing, ev ent ex- traction, or sibling ﬁltering): 61.6% judge accu- racy . With the full enrichmen t pip eline on the com- plete 401-en try dataset, accuracy rises to 93.3%— a 31.7 p ercentage p oint improv emen t. The > 6- mon th accuracy cliﬀ is the clearest evidence: pre- enric hment accuracy at this horizon was 37.5%; with enrichmen ts, it rises to 84.4%. The mec ha- nism is general: any memory system that generates retriev al-orien ted metadata at write time can bridge seman tic gaps that w ould otherwise require exact v o cabulary match or enormous context windows. The separation of recall accuracy (98.5%) from end-to-end accuracy (93.3%) is the second signa- ture analytical result. It demonstrates that the memory architecture has eﬀectively solv ed the re- triev al problem for Level-2 cognitive memory , and that the remaining gap is a consumer-side reason- ing c hallenge, not a memory c hallenge. Any im- pro vemen t to the answer mo del’s ability to use re- triev ed con text—whether through b etter prompt- ing, stronger mo dels, or ﬁne-tuning—translates di- rectly to higher end-to-end accuracy without any arc hitectural changes. Ablation study (planned). T o isolate the con tribution of eac h consolidation enric hment, we plan four conﬁgurations on the same 401-entry b enc hmark: (i) summary only (base consolidation, no enric hments), (ii) summary + ev ent extraction, (iii) summary + prosp ectiv e indexing, (iv) full sys- tem (summary + ev en ts + implications + sib- ling ﬁlter). The h yp othesis: ev en ts and implica- tions are complemen tary—ev ents preserv e factual anc hors (what happened); implications provide se- man tic bridges (what it means for the future). Nei- ther alone is suﬃcien t for the highest accuracy . 45 Pro of case: cog-261 (730-day gap)—even ts pro vided the factual anchor (cottage purc hase), implications pro vided the semantic bridge (ﬁnancial planning goals). Neither alone w ould ha ve retrieved this memory across a 2-y ear gap. Indep enden t repro duction. After sharing our ev aluation harness and setup details with the LoCoMo-Plus authors, they indep endently repro- duced the b enc hmark on their side and rep orted results in a similar range, though somewhat lo wer than our original rep ort (mid-80% accuracy rather than 93.3%). In priv ate corresp ondence, they also conﬁrmed that the system reliably surfaced the memory cues that later triggered the correct re- sp onse, and sp eciﬁcally noted that the prosp ective indexing mechanism app eared eﬀective for handling the cue-trigger seman tic disconnect that LoCoMo- Plus is designed to stress. W e include this not as a claim of exact score replication, but as independent supp ort for the underlying retriev al mechanism. Ca veats. Baseline scores are tak en from the LoCoMo-Plus publication [ 2026 ], not from con- trolled re-ev aluation. As with the LoCoMo ev al- uation, diﬀerences in LLM conﬁguration and ev alu- ation metho dology may account for some v ariance. The GPT-4o-mini answ er mo del scores ( ∼ 88% o ver- all, ∼ 73% goal, ∼ 89% v alue) are estimated from separate runs; exact ﬁgures ma y v ary by ± 1–2 p er- cen tage p oin ts due to non-deterministic LLM b e- ha vior. A further cav eat concerns b enc hmark construc- tion. Because the cue-trigger pairs in LoCoMo- Plus were generated using LLM-assisted pro ce- dures, some latent forw ard-implication structure in the dataset may align particularly well with the kinds of generative asso ciations that GPT-family mo dels are goo d at pro ducing and recognizing. In our o wn exp eriments, GPT-4o-family mo dels also scored un usually strongly on this b enchmark (e.g., GPT-4o-mini reaching appro ximately 88%), sug- gesting that part of the absolute score may re- ﬂect mo del-family alignmen t with the benchmark construction pro cess rather than memory arc hi- tecture alone. W e therefore in terpret the results as strong evidence that prospective indexing and structured memory retriev al are eﬀectiv e for this class of implicit-constrain t recall, while ac knowledg- ing that future ev aluation on more human-authored con versational corpora would pro vide a more con- serv ativ e measure of generalization. 15.4 Retriev al Observ ations Bey ond the standardized b enchmarks, informal testing against the live deploymen t graph rev ealed t wo qualitativ e prop erties of the hybrid retriev al pip eline worth noting. First, the system con- sisten tly surfaces semantically adjacen t memories alongside exact matches—querying “fav orite color” also retriev es other preference memories—reﬂecting the hybrid design’s breadth. Second, for a query on “A GM b elief revision,” the system correctly rank ed the planning memory (which describ ed the intent to add AGM pro ofs) abov e the execution memory (whic h recorded the c omplete d A GM section), sug- gesting that the fulltext index captures seman tic relev ance b ey ond simple k eyword matc hing. These observ ations are anecdotal; a comprehensive re- triev al ev aluation with 100+ queries spanning mul- tiple memory categories, along with ablation stud- ies comparing fulltext-only , vector-only , and hy- brid conﬁgurations, is planned as future w ork (Sec- tion 16 ). 15.5 Cross-Session Pro v enance: A Case Study A demanding test of the arc hitecture w as the it- erativ e authorship of this pap er, whic h spanned 6 revision sessions across 3 days. Each revision w as stored as a separate memory item in the work/paper-project space, with derived_fr om edges enco ding the lineage: v1 → v2 → v3 → v3-upd → v4 → v5 → v6 This prov enance c hain enabled several capabili- ties not a v ailable in ﬂat retriev al systems: (i) an agen t resuming work on a new session could tra- v erse the chain to understand what had changed and why; (ii) querying the prov enance summary of v6 resolved the full dep endency graph including the planning session that preceded it; (iii) the chain pro vided an auditable record of the document’s ev o- lution for h uman review. The pap er collab oration also demonstrated the b elief revision mec hanism in practice: the formal 46 A GM section (v6) w as planned in one session and executed in another, with the planning memory serving as the source dep endency . When the user pro vided reﬁnemen t feedbac k (e.g., improving tem- p oral tag representations), the agen t could recall prior planning con text without re-reading the full pap er, using only the compact summary stored in the graph. 15.6 Belief Revision in Practice The deploy ed system captured a concrete instance of b elief revision that exercises the formal mac hin- ery of Section 7 . W e w alk through the complete graph-lev el lifecycle to illustrate ho w the A GM p os- tulates manifest op erationally . Step 1: Initial b elief storage. On F ebru- ary 5, the agen t stored the fact “user’s fav orite color is blue” via memory_ingest , which created: • Item: kref://CognitiveMemory/user/ favorite-color.conversation • Revision r 1 with metadata {type: fact} • T ag latest 7→ r 1 A t this point the belief state is B ( τ 1 ) = { “fa vorite-color = blue” } . Step 2: Belief revision. On F ebruary 7, the user corrected: “fav orite color is now black, not blue.” The memory_ingest pip eline detected the existing item via fulltext search, triggering the re- vision path rather than creating a new item. The system: 1. Created revision r 2 on the same item with metadata {summary: “User’s favorite color is black (previously blue)”} . 2. Re-p oin ted the latest tag: latest 7→ r 2 (previ- ously r 1 ). This is the mutable p ointer op eration that implemen ts K ∗ 2 (Success): after revision, ψ ∈ B ( τ 2 ) . 3. Created a Supersedes edge: r 2 Supersedes − − − − − − − → r 1 . This edge records the prov enance of the c hange and is the mec hanism by whic h Rele- v ance and Core-Retainmen t are preserved— r 1 is not deleted but is no longer the authoritativ e b elief. The b elief state is no w B ( τ 2 ) = { “fav orite-color = blac k” } , while r 1 remains accessible for prov enance queries. K ∗ 4 (V acuity) was not triggered b ecause the new b elief contradicted the existing one. K ∗ 5 (Consistency) holds because the tag re-pointing en- sures only the non-con tradictory revision is author- itativ e. K ∗ 3 (Inclusion) holds b ecause B ( τ 2 ) ⊆ Cn ( B ( τ 1 ) ∪ { ψ } ) —the new belief state con tains only the up dated preference plus any unc hanged b eliefs. Step 3: Downstream impact. If other mem- ories dep ended on the color preference (e.g., a “room decoration plan” linked via Depends_On to the color b elief ), the analyze_impact op eration would tra verse the dep endency graph from r 2 and sur- face all do wnstream dep endents. In the deploy ed instance, the fa v orite-color item had no dep en- den ts, so the impact set w as empty . F or the pap er revision chain describ ed ab o ve, analyze_impact on v6 correctly propagated through the full De- rived_Fr om chain to surface all prior versions. Step 4: Retriev al with conﬂict presen ta- tion. A subsequen t query for “fa vorite color” re- turned b oth r 1 (blue, score 4.70) and r 2 (blac k, score 3.27), with r 1 ranking higher due to exact k eyword matc h on “blue” in the fulltext index. This illustrates b oth a strength and a current limitation. The immutable revision model correctly preserves b oth b eliefs with full pro venance—the agen t can see that the preference changed and when. Ho w- ev er, the retriev al ranking do es not y et automati- cally prioritize the more recen t revision. This is a concrete demonstration of why the re- triev al prop erties (Section 8.5 ) are design observ a- tions rather than formal guarantees: the retriev al pip eline can surface results whose ranking contra- dicts the formal b elief state B ( τ ′ ) . The conﬂict pre- sen tation design (Section 8.6 ) delib erately returns b oth beliefs with temp oral metadata, leaving res- olution to the agent’s reasoning la yer. In the cur- ren t deploymen t, the agent’s skill prompt instructs it to prefer the most recen tly created memory when conﬂicts are detected, implemen ting the belief re- vision at the application la yer. F uture w ork could incorp orate temporal recency as a retriev al signal (Section 16 ). The 49-scenario AGM compliance suite (Sec- tion 15.7 ) generalizes this single case to a sys- tematic veriﬁcation, testing eac h p ostulate across simple, m ulti-item, c hain, temporal, and adversar- ial conﬁgurations—including rapid sequential revi- sions (10 consecutive up dates to the same item) and case-v arian t v alues that stress-test the Supersedes 47 T able 18: A GM Compliance V eriﬁcation (49 sce- narios). P ost. Simple Multi Chain T emp. A dv. P ass K ∗ 2 ✓ ✓ ✓ ✓ ✓ 100% K ∗ 3 ✓ ✓ ✓ ✓ ✓ 100% K ∗ 4 ✓ ✓ ✓ ✓ ✓ 100% K ∗ 5 ✓ ✓ ✓ ✓ ✓ 100% K ∗ 6 ✓ ✓ – ✓ ✓ 100% Rel. ✓ ✓ ✓ ✓ ✓ 100% Core ✓ ✓ ✓ – ✓ 100% Ovrl 100% 100% 100% 100% 100% 100% 49 scenarios: 49 passed, 0 failed. ✓ =pass; –=N/A. mec hanism at scale. 15.7 A GM Compliance V eriﬁcation T o empirically v alidate the formal claims of Sec- tion 7 , w e implemented an automated compliance ev aluation suite comprising 49 test scenarios across 5 categories (simple, m ulti-item, c hain, temporal, adv ersarial), testing all 7 p ostulates claimed by the arc hitecture: K ∗ 2 (Success), K ∗ 3 (Inclusion), K ∗ 4 (V acuity), K ∗ 5 (Consistency), K ∗ 6 (Extension- alit y), Relev ance, and Core-Retainment. Eac h sce- nario creates b eliefs in a fresh graph instance, p er- forms revision or contraction op erations, and v eri- ﬁes p ostulate-sp eciﬁc assertions against the result- ing graph state. T able 18 rep orts the results: all 49 scenarios pass across all p ostulate–category com binations, with zero failures and zero errors. The “–” en tries in- dicate categories where no applicable test scenario w as deﬁned (e.g., K ∗ 6 Extensionality is not tested in chain conﬁgurations because logical equiv alence of c hains requires cross-item identit y , which falls outside the postulate’s scop e; Core-Retainmen t is not tested in temp oral conﬁgurations b ecause tem- p oral sequences do not pro duce the independent b e- lief structure needed to v erify minimal con traction). The adv ersarial category is particularly signiﬁ- can t: it tests edge cases such as case-v ariant v al- ues, long string v alues, rapid sequen tial revisions (10 consecutiv e revisions to the same item), similar item names (“color” vs. “colour”), idempotent revi- sions, deep dep endency c hains (A → B → C → D), and mixed edge types. All adv ersarial scenarios pass, conﬁrming that the graph-nativ e op erations satisfy the formal p ostulates not only in idealized condi- tions but under stress. These results provide empirical bac king for the formal proofs in Section 7 . The pro ofs establish that the op erations should satisfy the p ostulates giv en the op erational seman tics; the compliance suite v eriﬁes that the implementation faithfully ex- ecutes those seman tics across a diverse scenario space. This is particularly imp ortan t because the formal analysis op erates ov er an idealized prop osi- tional logic L G , while the implemen tation op erates o ver concrete Neo4j graph op erations with real net- w ork latency , concurren t access, and string-based metadata—conditions where implementation drift from formal sp eciﬁcation is common. 15.8 Dream State Consolidation The Dream State pip eline has b een v alidated b oth in deploymen t and during b enchmark ev aluation. In deplo ymen t, early runs on a small graph pro- duced conserv ativ e outcomes (0 deprecations, 0 tag up dates), correctly reﬂecting that a y oung graph with few episo dic memories has limited candi- dates for consolidation. The circuit break er and conserv ativ e assessment prompt av oided prema- ture pruning—the exp ected b ehavior. More signiﬁ- can tly , the safet y guards op erated as designed dur- ing the LoCoMo b enchmark ev aluation: the con- solidation pip eline processed all LoCoMo conv er- sation transcripts in to summaries without trigger- ing the circuit breaker’s max_deprecation_ratio threshold (set to 0.5), and items tagged published w ere protected from deprecation regardless of the LLM assessor’s recommendations. No manual in- terv ention w as required to prev ent the consolidation pip eline from corrupting the ev aluation data—the arc hitectural safety mec hanisms (Section 9 ) were suﬃcien t. 15.9 Limitations Scale. During the LoCoMo-Plus b enc hmark ev al- uation, the graph accumulated o ver 200,000 no des across all b enchmark conv ersations without ob- serv ed degradation in retriev al quality or sys- tem throughput. The Neo4j Aura back end and 48 Cloud Run-hosted gRPC server handled sustained concurren t ingestion, consolidation, edge discov- ery , and recall under multi-agen t load (concur- rency 12, en try concurrency 3) running contin u- ously ov er multiple da ys—eﬀectively a pro duction- grade multi-tenan t load test. This pro vides empir- ical evidence that the architecture scales well in to the hundreds-of-thousands-of-nodes range under re- alistic w orkloads. What remains untested is retriev al precision un- der adversarial scale conditions: graphs containing tens of millions of items with a very lo w ratio of rel- ev an t to irrelev an t memories, where v ector similar- it y alone ma y surface man y plausible but incorrect candidates. At that scale, the graph-augmented recall path (m ulti-query reform ulation and t yp ed edge trav ersal) b ecomes the primary disam bigua- tion mec hanism. Systematic ev aluation of retriev al precision and recall at that scale—and the optimal balance b et ween v ector, fulltext, and graph tra ver- sal comp onen ts—is planned as future work. Cross-system comparison metho dology . The LoCoMo tok en-lev el F1 results (Section 15.2 ) use the same scoring function as comp eting systems (Zep, Mem0, Memobase), enabling direct compar- ison on the oﬃcial metric. Ho wev er, comp etitor scores are sourced from their resp ective publica- tions, not from con trolled re-ev aluation on identical infrastructure and LLM conﬁgurations. Diﬀerences in underlying LLM, prompt engineering, and ev alu- ation metho dology may account for some v ariance; a fully con trolled comparison using the same LLM, prompt template, and hardware is needed to isolate arc hitectural contributions from confounds. Ev aluation scop e complemen tarity . Lo- CoMo ev aluates end-to-end recall quality (can the system answ er questions ab out past conv ersations?) but do es not directly exercise the formal b elief re- vision mac hinery—its questions do not require the system to detect contradictions, sup ersede prior b eliefs, or propagate impact through dependency c hains. The A GM compliance suite (Section 15.7 ) ﬁlls this gap by testing the graph-lev el op erations in isolation, but do es not measure whether those op- erations impro ve downstream agent b ehavior. The t wo ev aluation approac hes are therefore comple- men tary: LoCoMo v alidates that the ov erall arc hi- tecture pro duces correct answers; the AGM suite v alidates that the b elief revision op erations sat- isfy formal rationalit y cons train ts. A benchmark that com bines b oth—requiring the agent to answ er questions whose correctness dep ends on ha ving p er- formed correct b elief revision (e.g., MemoryAgent- Benc h’s conﬂict resolution tasks [ 2026 ])—is the crit- ical missing ev aluation, planned as future work. In ternal retriev al ev aluation. The retriev al observ ations rep orted in this paper are anecdotal. A comprehensiv e assessment would require a larger query set (100+ queries) with multiple annotators establishing ground-truth relev ance lab els. Self-ev aluation bias. The system is ev aluated on its own deploymen t data during the author- ship of this pap er, creating an inherent circularit y . W e hav e attempted to mitigate this by rep orting ra w num b ers without fa vorable interpretation and b y explicitly ackno wledging where the system falls short (e.g., the b elief revision ranking limitation). L G expressiv eness. The formal results hold for a delib erately w eak prop ositional logic ov er ground triples. This logic cannot express subsump- tion hierarc hies, role composition, disjoin tness ax- ioms, or cardinality constraints—features that real kno wledge graphs often need. Any strengthening of L G to ward ric her logics would re-encoun ter Flouris- t yp e imp ossibility results. The formal contribution is thus scop ed: it demonstrates that A GM b elief revision is achiev able for graph-native memory sys- tems at the prop ositional lev el, but does not extend to more expressiv e representations. F ormal scop e. The primary formal claim co v- ers K ∗ 2 – K ∗ 6 plus Relev ance and Core-Retainmen t. The supplemen tary p ostulates ( K ∗ 7 , K ∗ 8 ) remain op en—establishing them requires constructing an en trenchmen t ordering or proving the system’s op- erations enco de a transitiv ely relational con trac- tion. Additionally , the p ostulates are prov ed for B ( τ ) (the full b elief base determined by tag assign- men ts), not for the sp eciﬁc subset surfaced b y the h ybrid retriev al pip eline, whic h introduces score- based non-determinism. Dream State LLM dep endency . The con- solidation pip eline’s qualit y dep ends entirely on the LLM’s assessment accuracy . Incorrect deprecation recommendations, despite safet y guards, could de- grade the memory graph o v er time. The circuit break er mitigates catastrophic failures but cannot 49 prev ent gradual qualit y erosion from consisten tly biased assessmen ts. System p erformance. W e do not rep ort la- tency distributions, throughput measuremen ts, or memory o verhead p er b elief. Basic system metrics are needed to substan tiate the arc hitectural claims. Benc hmark construction bias. LoCoMo- Plus is a v aluable stress test for implicit-constraint memory retriev al, but its cue-trigger pairs were originally produced through LLM-assisted con- struction. This ma y induce laten t forward- implication patterns that align b etter with GPT- family models and retriev al systems explicitly de- signed to surface future-relev an t implications. As a result, the b enchmark likely provides stronger evidence for the usefulness of prosp ectiv e index- ing than for exact absolute performance in uncon- strained real-world dialogue. F uture ev aluation on more h uman-authored corp ora is needed. 16 F uture Directions 16.1 Ev aluation Roadmap LoCoMo-Plus extensions : The LoCoMo-Plus ev aluation (Section 15.3 , n =401 , 93.3% with GPT- 4o; mid-80% range in indep endent reproduc- tion) demonstrates that prosp ective indexing and ev ent extraction largely solv e the retriev al prob- lem (98.5% recall accuracy), with the remaining failures concentrated in answer mo del reasoning. Three planned extensions target the remaining w eaknesses. Go al-typ e ac cur acy : the 85% accuracy on goal-type questions (abstract in ten tion infer- ence) motiv ates in vestigation of goal-aw are prosp ec- tiv e indexing—sp eciﬁcally generating implications for stated goals and in tentions, not just even ts, to create b etter seman tic bridges for the most ab- stract constraint type. Chr onolo gic al c ontext or der- ing : currently , recalled memories are presented to the answ er model in relev ance-score order; sorting b y chronological order ma y help the mo del reason ab out temp oral progressions (e.g., “sa ved for ring 6 months ago → now complaining ab out car key costs”), particularly for goal-type constrain ts where the narrative arc matters. Ablation study : isolat- ing the individual contributions of even t extraction, prosp ectiv e indexing, sibling relev ance ﬁltering, and graph-augmen ted recall by running four conﬁgura- tions (summary-only , +ev en ts, +implications, full system) on the same 401-en try benchmark. The pre-enric hment baseline (61.6% on n =200 ) estab- lishes the low er b ound; the full system (93.3%) es- tablishes the upp er b ound. The hypothesis: even ts and implications are complemen tary—even ts pre- serv e factual anchors (what happ ened); implica- tions pro vide semantic bridges (what it means for the future). Neither alone is suﬃcient for the highest accuracy . Str onger answer mo dels : with 98.5% recall accuracy , the theoretical ceiling is ∼ 99%. A mo del with stronger causal reasoning and less tendency to fabricate on correctly re- triev ed con text would close the remaining gap— the arc hitecture requires no changes, only a mo del sw ap. A dditional benchmarks including Mem- Benc h, Mem2ActBenc h, MEMTRA CK, and PER- SONAMEM will establish our system’s p osition across div erse ev aluation dimensions. LongMemEv al [ 2025 ]: T emp oral reasoning and kno wledge up date ev aluation. MemoryA- gen tBench [ 2026 ]: F our-comp etency ev aluation framew ork (accurate retriev al, test-time learn- ing, long-range understanding, conﬂict resolution) with Ev entQA and F actConsolidation datasets— the conﬂict resolution competency is directly rel- ev an t to our AGM-grounded revision op erators. Con trolled cross-system comparison : Re- ev aluating comp etitor systems (Graphiti, Mem0, Hindsigh t) on identical infrastructure and LLM conﬁgurations to isolate architectural contributions from confounds in the LoCoMo results. The token- lev el F1 results (Section 15.2 ) partially address this b y enabling metric-comparable ev aluation against systems that also rep ort F1. Retriev al ablation studies : Isolating each re- triev al branch’s con tribution. Sensitivit y analysis for β ∈ [0 . 5 , 1 . 0] , type weigh ts, and comparison against RRF and con vex combination. 16.2 F ormal Extensions En trenchmen t ordering for K ∗ 7 / K ∗ 8 : Con- structing an explicit epistemic entrenc hmen t order- ing o v er graph triples, or pro ving that tag-based con traction is equiv alen t to a transitively relational selection function, w ould complete the formal pic- 50 ture. W e iden tify three candidate sources for suc h an ordering: (a) temp oral recency of the revision con taining the b elief (more recen t b eliefs are more en trenched), (b) in-degree in the dep endency graph (b eliefs dep ended upon by man y others are more en trenched), and (c) conﬁdence metadata attached to revisions by the Dream State pip eline. As dis- cussed in Section 7.2 , the key challenge is that no single ordering is appropriate for all b elief t yp es. W e conjecture that a typ e-dep endent entr enchment function —where the ordering criterion v aries b y be- lief kind (temp oral recency for preferences, eviden- tial supp ort for facts, conﬁdence score for inferred b eliefs)—w ould satisfy K ∗ 7 / K ∗ 8 for the com- mon case of single-t yp e revisions while requiring careful treatment for cross-type b elief in teractions. F ormalizing this conjecture requires pro ving that the type-restricted orderings comp ose into a global total preorder satisfying the Gärdenfors–Makinson conditions, or alternativ ely , showing that a weak er condition (a partial preorder with type-lo cal total- it y) suﬃces for the graph-native revision op erator. Ric her logics : Extending L G to ward frag- men ts of description logics that remain A GM- compatible Qi et al. [ 2006 ], potentially via Aigu- ier et al.’s Aiguier et al. [ 2018 ] satisfaction system framew ork. This w ould enable subsumption reason- ing and role comp osition within the b elief revision framew ork. P artial merge op erator : Deﬁning a formally c haracterized merge operator for partial b elief up- dates within a single revision. The curren t whole- revision replacement strategy is clean but coarse- grained. A merge op erator based on Konieczn y and Pino P érez’s K onieczny and Pino Pérez [ 2002 ] be- lief merging framework could handle contradictory sub-claims by identifying the minimal set of atoms that conﬂict with the new input and replacing only those, while preserving unchanged co-lo cated be- liefs. The key challenge is ensuring that suc h an op- erator satisﬁes the A GM p ostulates—particularly Relev ance, whic h requires that only b eliefs relev an t to the new input are aﬀected. 16.3 System Extensions A daptive Consolidation : Urgency-based trig- gering; agent-initiated consolidation. Extend- ing the Dream State with anticipatory pre- computation—pre-reasoning ab out lik ely future queries o ver the graph structure (e.g., pre- computing Anal yzeImp act cascades for recen tly revised b eliefs)—following Letta’s sleep-time com- pute approac h [ 2025 ]. T emp oral Recency in Retriev al : Incorporat- ing recency as a third retriev al signal alongside full- text and v ector scoring. 17 Conclusion W e hav e presented Kumiho, a graph-nativ e cognitiv e memory arc hitecture whose structural primitiv es—immutable revisions, typed edges, mu- table tag p oin ters, URI addressing—sim ultaneously serv e as the op erational infrastructure for managing agen t work products. Agen ts use the same graph to remem b er past interactions and to v ersion, lo cate, and build upon each other’s outputs—enabling fully autonomous multi-agen t pip elines without separate asset trac king systems. Sev eral individual comp o- nen ts exist in concurrent systems; we do not claim no velt y for them individually . The con tribution is their arc hitectural synthesis, grounded in formal b e- lief revision analysis and the recognition that cogni- tiv e memory and work pro duct management share iden tical structural requirements. The core formal result is a correspondence b e- t ween the A GM b elief revision p ostulates and graph-nativ e memory op erations, framed at the b e- lief base level following Hansson Hansson [ 1999 ]. W e pro ved satisfaction of the basic rationalit y p os- tulates ( K ∗ 2 – K ∗ 6 ) and Hansson’s b elief base pos- tulates (Relev ance, Core-Retainmen t), pro vided a principled rejection of Recov ery grounded in pro ve- nance preserv ation, and identiﬁed the supplemen- tary p ostulates ( K ∗ 7 , K ∗ 8 ) as an op en question requiring construction of an en trenchmen t ordering. W e addressed why the Flouris et al. imp ossibility results do not apply to our prop erty graph formal- ism. Bey ond the formalism, the architecture con- tributes: (i) a uniﬁed graph where cognitive mem- ory primitives sim ultaneously serve as operational asset management infrastructure—agents use the same graph to remember and to manage each 51 other’s w ork products in multi-agen t pipelines (arc hitecturally enabled and deploy ed in pro duc- tion, though multi-agen t pip eline ev aluation is planned as future w ork); (ii) SDK transparency en- abling b oth multi-agen t coordination (agents query the graph to ﬁnd inputs) and h uman go vernance (op erators audit agent b eliefs through the same API); (iii) safety-hardened consolidation with guard mec hanisms (circuit breakers, dry-run v alidation, published-item protection) not found in the con- curren t systems we surv eyed; (iv) a URI-based ad- dressing scheme unique among agent memory sys- tems; and (v) a BYO-storage artifact mo del whose priv acy-preserving compression is sim ultaneously a robustness mec hanism. Empirically , on the LoCoMo benchmark [ 2024 ] (oﬃcial token-lev el F1 with Porter stemming), Ku- miho achiev es 0.447 four-category F1 ( n =1 , 540 )— the highest rep orted score across the retriev al cat- egories. Separately , adversarial refusal accuracy reac hes 97.5% ( n =446 ), demonstrating production- grade hallucination resistance. This near-p erfect adv ersarial score is a natural consequence of the b elief revision architecture: the memory graph gen- uinely do es not contain fabricated information— imm utable revisions preserv e only what was actu- ally discussed—so there is nothing for the model to hallucinate from. The com bination of top-tier re- triev al F1 and near-p erfect hallucination resistance is the cen tral empirical result. Including adv ersarial binary scoring, ov erall F1 is 0.565 ( n =1 , 986 ). On LoCoMo-Plus [ 2026 ]—a Level-2 cognitiv e memory b enc hmark testing implicit constraint recall under cue-trigger seman tic disconnect—Kumiho achiev es 93.3% judge accuracy ( n =401 ) with 98.5% recall accuracy , outp erforming the b est published base- line (Gemini 2.5 Pro, 45.7%) b y 47.6 percentage p oin ts while using GPT-4o-mini for bulk op era- tions at a total cost of ∼ $14. Three arc hitec- tural inno v ations driv e b oth results: pr osp e ctive indexing (LLM-generated future-scenario implica- tions indexed alongside summaries), event extr ac- tion (structured ev en ts with consequences preserv- ing causal detail), and client-side LLM r er ank- ing (Section 8.8 ), where the consuming agen t’s o wn LLM selects the most relev an t sibling revi- sion from structured metadata at zero additional cost. The enric hments drov e LoCoMo-Plus ac- curacy from a pre-enric hment baseline of 61.6% to 93.3%, eliminating the > 6-month accuracy cliﬀ (37.5% → 84.4%). The architecture is mo del- decoupled: the 98.5% recall accuracy is in v ariant across answer models, while end-to-end accuracy scales from ∼ 88% (GPT-4o-mini) to 93.3% (GPT- 4o), with the gap concentrated in goal-t yp e con- strain ts requiring abstract reasoning. Of the 27 fail- ures, 78% are answer mo del fabrication on cor- rectly retrieved context—demonstrating that the memory lay er has eﬀectiv ely solved the retriev al problem, and the remaining gap is a consumer-side reasoning c hallenge. Where all tested baselines— including premium mo dels with million-token con- text windows—ac hieve 23–46%, the graph-native primitiv es (typed edges, structured summarization, prosp ectiv e indexing, sibling relev ance ﬁltering, m ulti-query reformulation, client-side LLM rerank- ing) provide the structural reasoning capabilities that surface-lev el retriev al cannot substitute. Automated AGM compliance v eriﬁcation (49 sce- narios, 100% pass rate) conﬁrms that the implemen- tation faithfully executes the formal sp eciﬁcation. Con trolled cross-system comparison, consolidation enric hment ablation, and additional b enchmarks (LongMemEv al, MemoryAgentBenc h) remain im- p ortan t directions for future work. In an era where AI agents p erform consequen- tial work—producing artifacts, making decisions, and collaborating autonomously in m ulti-agen t pip elines—their memory m ust serv e double dut y: as the cognitive substrate for individual agen t in- telligence and as the shared op erational infrastruc- ture through which agen ts co ordinate, build upon eac h other’s outputs, and main tain auditable prov e- nance c hains. The seven design principles distilled from this work pro vide a reusable framew ork for building suc h systems: systems where every b elief can b e traced to its evidence, ev ery output can b e found and built up on b y downstream agents, and ev ery decision chain is op en to human insp ection. References C. E. Alchourrón, P . Gärdenfors, and D. Makinson. On the logic of theory change: P artial meet con- 52 traction and revision functions. Journal of Sym- b olic L o gic , 50(2):510–530, 1985. M. Aiguier, J. A tif, I. Blo ch, and C. Hudelot. Be- lief revision, minimal change and relaxation: A general framework based on satisfaction systems, and applications to description logics. Artiﬁcial Intel ligenc e , 256:160–180, 2018. T. I. Arav anis. T ow ards mac hine learning as AGM- st yle b elief c hange. International Journal of Ap- pr oximate R e asoning , 183:109437, 2025. R. C. Atkinson and R. M. Shiﬀrin. Human mem- ory: A prop osed system and its control pro cesses. In Psycholo gy of L e arning and Motivation , v ol- ume 2, pages 89–195. A cademic Press, 1968. A. Baddeley . The episo dic buﬀer: A new compo- nen t of w orking memory? T r ends in Co gnitive Scienc es , 4(11):417–423, 2000. S. Baitalik, R. Datta, A. K. Das, and S. Das Choud- h ury . Con versation as belief revision: GreedySA T revision for global logical consistency in m ulti- turn LLM dialogues. In AAAI 2026 Bridge on L o gic & AI , 2026. S. Bruch, S. Gai, and A. Ingb er. An analysis of fu- sion functions for hybrid retriev al. A CM T r ansac- tions on Information Systems , 42(1):1–35, 2023. G. V. Cormac k, C. L. A. Clarke, and S. Büttc her. Recipro cal rank fusion outp erforms Condorcet and individual rank learning metho ds. In Pr o c. 32nd International ACM SIGIR Confer- enc e , pages 758–759, 2009. A. Darwiche and J. Pearl. On the logic of iterated b elief revision. Artiﬁcial Intel ligenc e , 89(1–2):1– 29, 1997. G. Flouris, D. Plexousakis, and G. An toniou. On applying the AGM theory to DLs and OWL. In Pr o c. 4th International Semantic W eb Confer- enc e (ISWC) , pages 216–231, 2005. A. F uhrmann. Theory contraction through base con traction. Journal of Philosophic al L o gic , 20(2):175–203, 1991. P . Gärdenfors. Know le dge in Flux: Mo deling the Dynamics of Epistemic States . MIT Press, 1988. P . T rick el, A. Lap ets, and J. Hipple. Graphiti: Building real-time, streaming knowledge graphs for AI agen ts. Zep AI, 2025. S. O. Hansson. Belief contraction without recov ery . Studia L o gic a , 50(2):251–260, 1991. S. O. Hansson. A T extb o ok of Belief Dynamics: The ory Change and Datab ase Up dating . Kluw er A cademic Publishers, 1999. P . Hase, T. Hofw eb er, X. Zhou, E. Stengel-Eskin, and M. Bansal. F undamental problems with mo del editing: How should rational belief revi- sion w ork in LLMs? T r ansactions on Machine L e arning R ese ar ch (TMLR) , 2024. Y. Hu, S. Liu, Y. Y ue, et al. Memory in the age of AI agents. arXiv pr eprint arXiv:2512.13564 , 2025. P . Lewis, E. Perez, A. Piktus, et al. Retriev al- augmen ted generation for knowledge-in tensive NLP tasks. In NeurIPS , 2020. Letta T eam. Sleep-time compute: Bey ond in- ference scaling at test-time. arXiv pr eprint arXiv:2504.13171 , 2025. C. P ack er, S. W o o ders, K. Lin, V. F ang, and J. E. Gonzalez. Letta Context Rep ositories: Git-bac ked v ersioned memory for stateful agents. Letta Blog, F ebruary 2026. A. Maharana, D.-H. Lee, S. T ulyak ov, M. Bansal, F. Barbieri, and Y. F ang. Ev aluating v ery long- term con v ersational memory of LLM agen ts. In Pr o c e e dings of the 62nd A nnual Me eting of the A CL , pages 13851–13870, 2024. D. W u, H. W ang, W. Y u, Y. Zhang, K.-W. Chang, and D. Y u. LongMemEv al: Benc hmarking c hat assistan ts on long-term interactiv e memory . In ICLR , 2025. D. Jiang, Y. Li, G. Li, and B. Li. MA GMA: A m ulti-graph based agen tic memory arc hitecture for AI agents. arXiv pr eprint arXiv:2601.03236 , 2026. 53 D. Makinson. On the status of the p ostulate of reco very in the logic of theory change. Journal of Philosophic al L o gic , 16(4):383–394, 1987. Mo del Context Proto col Sp eciﬁcation (2025- 11-25). Anthropic, 2025. https: //modelcontextprotocol.io . Mem0 T eam. Mem0: The memory lay er for AI agents. https://github.com/mem0ai/mem0 , 2025. MemAgen ts: Memory for LLM-Based Agen tic Sys- tems. ICLR 2026 W orkshop, 2026. C. Pac ker, V. F ang, S. G. P atil, K. Lin, S. W oo d- ers, and J. E. Gonzalez. MemGPT: T ow ards LLMs as op erating systems. arXiv pr eprint arXiv:2310.08560 , 2023. Y. Hu, Y. W ang, and J. McAuley . Ev aluating mem- ory in LLM agen ts via incremen tal multi-turn in- teractions. In ICLR , 2026. J. Kang, M. Ji, Z. Zhao, and T. Bai. Memory OS of AI agen t. In EMNLP , 2025. G. Qi, W. Liu, and D. A. Bell. A revision-based approac h to handling inconsistency in descrip- tion logics. Artiﬁcial Intel ligenc e R eview , 26(1– 2):115–128, 2006. B. Rasc h and J. Born. Ab out sleep’s role in memory . Physiolo gic al R eviews , 93(2):681–766, 2013. S. Rob ertson and H. Zaragoza. The probabilistic relev ance framework : BM25 and b ey ond. F oun- dations and T r ends in Information R etrieval , 3(4):333–389, 2009. T. Sumers, S. Y ao, K. Narasimhan, and T. L. Griﬃths. Cognitive architectures for language agen ts. T r ansactions on Machine L e arning R e- se ar ch , 2024. E. T ulving. Episo dic and semantic memory . In E. T ulving and W. Donaldson, editors, Or ganiza- tion of Memory , pages 381–403. Academic Press, 1972. A. V aswani, N. Shazeer, N. Parmar, et al. Atten tion is all y ou need. In NeurIPS , pages 5998–6008, 2017. L. W ang, C. Ma, X. F eng, et al. A surv ey on large language mo del based autonomous agen ts. F r on- tiers of Computer Scienc e , 18(6):186345, 2024. B. Wilie, Z. Xu, S. R. Cah ya wija ya, A. Lo venia, and P . F ung. Belief-R: An LLM b enchmark for b elief revision in con v ersational reasoning. In EMNLP , 2024. W. Xu, Z. Liang, K. Mei, H. Gao, J. T an, and Y. Zhang. A-MEM: Agentic memory for LLM agen ts. In NeurIPS , 2025. Z. Zhang, X. Bo, C. Ma, R. Li, X. Chen, Q. Dai, J. Zhu, Z. Dong, and J.-R. W en. A surv ey on the memory mechanism of large language mo del based agents. arXiv pr eprint arXiv:2404.13501 , 2024. R. Li, Z. Zhang, X. Bo, Z. Tian, X. Chen, Q. Dai, Z. Dong, and R. T ang. CAM: A constructivist view of agentic memory for LLM-based reading comprehension. In NeurIPS , 2025. C. Hu, X. Gao, Z. Zhou, D. Xu, Y. Bai, X. Li, H. Zhang, T. Li, C. Zhang, L. Bing, and Y. Deng. Ev erMemOS: A self-organizing memory op erat- ing system for structured long-horizon reasoning. arXiv pr eprint arXiv:2601.02163 , 2026. J. Puget Gil, E. Co query , J. Sam uel, and G. Gesquière. Con V er-G: Concurren t v ersioning of kno wledge graphs. In BDA 2024 , 2024. C. Y ang, C. Zhou, Y. Xiao, et al. Graph-based agen t memory: T axonomy , tec hniques, and appli- cations. arXiv pr eprint arXiv:2602.05665 , 2026. J. Luo, Y. Tian, C. Cao, Z. Luo, H. Lin, K. Li, C. Kong, R. Y ang, and J. Ma. F rom stor- age to exp erience: A surv ey on the evolution of LLM agent memory mec hanisms. Pr eprints.or g , 202601.0618, 2026. G. Bonanno. Belief revision: The Ba yesian and A GM approaches uniﬁed via Kripke– Lewis se- man tics. Artiﬁcial Intel ligenc e , 338:104259, 2025. H. Meng, Z. Long, M. Sioutis, and Z. Zhou. On def- inite iterated b elief revision with b elief algebras. In IJCAI , 2025. 54 C. Latimer, N. Bosc hi, A. Neeser, C. Bartholomew, G. Sriv asta v a, X. W ang, and N. Ramakrish- nan. Hindsigh t is 20/20: Building agen t memory that retains, recalls, and reﬂects. arXiv pr eprint arXiv:2512.12818 , 2025. Y. Y u, L. Y ao, Y. Xie, Q. T an, J. F eng, Y. Li, and L. W u. Agentic memory: Learning uniﬁed long-term and short-term memory management for large language mo del agen ts. arXiv pr eprint arXiv:2601.01885 , 2026. K. W ang, Y. Lin, J. Lou, Z. Zhou, B. Suvono v, and J. Li. E-mem: Multi-agen t based episo dic con text reconstruction for LLM agent memory . arXiv pr eprint arXiv:2601.21714 , 2026. Y. Li, W. Guo, L. Zhang, R. Xu, M. Huang, H. Liu, L. Xu, Y. Xu, and J. Liu. Lo como- Plus: Bey ond-factual cognitiv e memory ev alua- tion framew ork for LLM agen ts. arXiv pr eprint arXiv:2602.10715 , 2026. M. F u, G. Zhang, X. Xue, Y. Li, Z. He, S. Huang, X. Qu, Y. Cheng, and Y. Y ang. LatentMem: Customizing latent memory for m ulti-agent sys- tems. arXiv pr eprint arXiv:2602.03036 , 2026. W.-C. Huang, W. Zhang, Y. Liang, et al. Rethink- ing memory mechanisms of foundation agen ts in the second half: A surv ey . arXiv pr eprint arXiv:2602.06052 , 2026. K. Ma, B. Li, Y. T ang, L. Sun, and R. Jin. CAST: Character-and-scene episo dic memory for agents. arXiv pr eprint arXiv:2602.06051 , 2026. J. Chandler and R. Booth. Parallel b elief revision via order aggregation. In IJCAI , 2025. N. Sch wind, K. Inoue, S. K onieczny , and P . Mar- quis. Iterated b elief change as learning. In IJCAI , pages 4669–4677, 2025. R. Reiter. On closed world data bases. In H. Gal- laire and J. Mink er, editors, L o gic and Data Bases , pages 55–76. Plen um Press, 1978. J. P . Delgrande, P . Peppas, and S. W oltran. General b elief revision. Journal of the ACM , 65(5):1–34, 2018. N. Arndt, P . Naumann, N. Radtke, M. Martin, and E. Marx. Decen tralized collaborative kno wledge managemen t using Git. Journal of W eb Seman- tics , 54:29–47, 2018. R. T aelman, M. V ander Sande, and R. V erb orgh. OSTRICH: V ersioned random-access triple store. In Comp anion of the W eb Confer enc e , pages 127– 130, 2018. M. V ander Sande, P . Colpaert, R. V erb orgh, S. Coppens, E. Mannens, and R. V an de W alle. R&Wbase: Git for triples. In LDO W W orkshop at WWW , 2013. M. Völk el, W. Winkler, Y. Sure, S. R. Kruk, and M. Synak. Sem V ersion: A v ersioning system for RDF and ontologies. In Pr o c. 2nd Eur op e an Se- mantic W eb Confer enc e (ESW C) , 2005. J. W u. Git Con text Con troller: Manage the con- text of LLM-based agents lik e Git. arXiv pr eprint arXiv:2508.00031 , 2025. E. A. F ox and J. A. Shaw. Combination of multiple searc hes. In TREC-2 , pages 243–252, 1993. A. Gro ve. T wo mo dellings for theory c hange. Jour- nal of Philosophic al L o gic , 17(2):157–170, 1988. S. K onieczny and R. Pino Pérez. Merging infor- mation under constraints: A logical framework. Journal of L o gic and Computation , 12(5):773– 808, 2002. M. Gelfond and V. Lifsc hitz. The stable mo del se- man tics for logic programming. In ICLP/SLP , pages 1070–1080, 1988. M. Gelfond and V. Lifschitz. Classical negation in logic programs and disjunctiv e databases. New Gener ation Computing , 9(3–4):365–385, 1991. Zep AI. Ev aluating memory for AI agents: A com- prehensiv e benchmark analysis. arXiv pr eprint arXiv:2504.19413 , 2025. Memobase T eam. Memobase: Stateful user mem- ory for LLM applications. Softw are, https:// github.com/memodb- io/memobase , 2025. 55 D. P atel and S. Patel. ENGRAM: Eﬀectiv e, ligh tw eight memory orc hestration for con v ersa- tional agen ts. arXiv pr eprint arXiv:2511.12960 , 2025. A Key Data Structures Memory Reference URI F ormat: kref://project/space[/sub]/ item.kind[?r=N][&a=art] Revision No de Prop erties: { ref: "kref://proj/space/item.kind?r=N", _search_text: "concat. searchable text...", embedding: [float64 x 1536], embedding_text: "text embedded", embedding_updated_at: datetime, metadata: { schema, type, summary, topics, keywords }, created_at: datetime, author: "agent-id" } Redis W orking Memory Key Structure: cogmem:{proj}:sessions:{sid}:messages cogmem:{proj}:sessions:{sid}:metadata cogmem:{proj}:consol_queue Dream State Assessmen t Structure: MemoryAssessment { revision_ref: str relevance_score: float // 0.0-1.0 should_deprecate: bool deprecation_reason: str suggested_tags: List[str] metadata_updates: Dict[str, str] related_memories: List[(ref, type)] } B Dream State Rep ort F ormat Eac h Dream State run pro duces a Markdown report stored as a revision artifact, do cumenting ev ents pro cessed, memories assessed, actions taken (dep- recations, metadata up dates, tags added, relation- ships created), and the cursor for resumption. # Dream State Report -- 2026-02-04T03:00Z **Events:** 42 | **Assessed:** 18 **Duration:** 4500ms ## Actions Taken ### Deprecated (3) - kref://CognitiveMemory/personal/... -- duplicate ### Metadata Updated (7) - kref://CognitiveMemory/work/... -- topics added ### Tags Added (12) - kref://CognitiveMemory/personal/... -- ssh, error ### Relationships Created (5) - source -> target (DERIVED_FROM) ## Cursor eyJjdXJzb3IiOiIxMjM0NTY3ODkwIn0= 56

Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment