QUICK REVIEW

[논문 리뷰] Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Preston Rasmussen, Pavlo Paliychuk|ArXiv.org|2025. 01. 20.

Graph Theory and Algorithms인용 수 4

한 줄 요약

Zep은 시계열적으로 인지되는 Graphiti 지식 그래프를 기반으로 한 메모리 계층을 도입하여 메모리 벤치마크에서 최첨단 성과를 달성하고 LongMemEval에서 지연 시간을 크게 줄입니다. 에피소드 기억과 의미 기억, 커뮤니티 요약을 결합하여 동적이고 다원 데이터 소스를 처리합니다.

ABSTRACT

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark. Additionally, Zep excels in more comprehensive and challenging evaluations than DMR that better reflect real-world enterprise use cases. While existing retrieval-augmented generation (RAG) frameworks for large language model (LLM)-based agents are limited to static document retrieval, enterprise applications demand dynamic knowledge integration from diverse sources including ongoing conversations and business data. Zep addresses this fundamental limitation through its core component Graphiti -- a temporally-aware knowledge graph engine that dynamically synthesizes both unstructured conversational data and structured business data while maintaining historical relationships. In the DMR benchmark, which the MemGPT team established as their primary evaluation metric, Zep demonstrates superior performance (94.8% vs 93.4%). Beyond DMR, Zep's capabilities are further validated through the more challenging LongMemEval benchmark, which better reflects enterprise use cases through complex temporal reasoning tasks. In this evaluation, Zep achieves substantial results with accuracy improvements of up to 18.5% while simultaneously reducing response latency by 90% compared to baseline implementations. These results are particularly pronounced in enterprise-critical tasks such as cross-session information synthesis and long-term context maintenance, demonstrating Zep's effectiveness for deployment in real-world applications.

연구 동기 및 목표

정적 코퍼라벨이 아닌 대화 및 비즈니스 데이터를 통합하는 동적이고 기억이 가능한 에이전트의 필요성을 제시한다.
Graphiti를 기반으로 한 그래프 기반 기억 계층(Zep)을 제안하여 시간적으로 정확하고 손실되지 않는 기억 표현을 지원한다.
기업용 사용 사례와 관련된 메모리 벤치마크에서 검색 정확도 향상과 지연 감소를 입증한다.

제안 방법

에피소드 원시 메시지( Episodes) 하위 그래프, 추출된 엔티티/사실을 포함하는 시맨틱 엔티티 하위 그래프, 고수준 요약인 커뮤니티 하위 그래프의 3계층 시계열 지식 그래프를 도입한다.
타임라인 T와 트랜잭션 타임라인 T'을 모두 가능하게 하는 이중 타임라인으로 에피소드를 수집하고 망각되지 않는 연결을 유지하여 추적 가능성을 보장한다.
embeddings, 엔티티 해상도, 시계열 간선 무효화 등을 통해 엔티티 및 사실 추출을 수행하고, 다중 엔티티 사실를 위한 하이퍼 간선을 포함하여 변화하는 지식을 관리한다.
동적 라벨 전파를 통해 커뮤니티를 구성하고 확장 가능한 최신 요약 및 검색을 가능하게 한다.
코사인 유사도, BM25, BFS 그래프 검색을 결합한 메모리 검색 파이프라인(검색, 재정렬, 구성자)을 구현하고, 재정렬자에는 RRF, MMR, 교차 인코더 점수를 사용한다.
DMR(MemGPT)와 LongMemEval 벤치마크에서 gpt-4o-mini와 gpt-4-turbo 모델을 사용하여 정확도와 지연 시간을 비교한다.

실험 결과

연구 질문

RQ1시간 의식적 지식 그래프 메모리 계층이 정적 문서 RAG 접근법보다 장기간의 대화 및 기업 데이터에 대한 검색 정확도를 향상시킬 수 있는가?
RQ2Graphiti 기반의 메모리(에피소드/시맨틱 하위 그래프 및 커뮤니티 포함)가 실제 배치에서 지연 시간과 확장성에 어떤 영향을 미치는가?
RQ3시간 추출 및 간선 무효화가 시간이 지남에 따라 정확하고 최신의 메모리를 유지하는 데 어떤 영향을 미치는가?

주요 결과

Memory	Model	Score	Latency	Latency IQR	Avg Context Tokens
DMR	Recursive Summarization	35.3%
DMR	Conversation Summaries	78.6%
DMR	MemGPT	93.4%
DMR	Full-conversation	94.4%
DMR	Zep	94.8%
DMR	Conversation Summaries (gpt-4o-mini)	88.0%
DMR	Full-conversation (gpt-4o-mini)	98.0%
DMR	Zep (gpt-4o-mini)	98.2%
LongMemEval	Full-context (gpt-4o-mini)	55.4%	31.3 s	8.76 s	115k
LongMemEval	Zep (gpt-4o-mini)	63.8%	3.20 s	1.31 s	1.6k
LongMemEval	Full-context (gpt-4o)	60.2%	28.9 s	6.01 s	115k
LongMemEval	Zep (gpt-4o)	71.2%	2.58 s	0.684 s	1.6k

Zep은 gpt-4-turbo를 사용한 DMR에서 94.8%의 정확도, gpt-4o-mini를 사용한 DMR에서 98.2%의 정확도를 달성하여 MemGPT 벤치마인을 능가한다.
LongMemEval에서 Zep은 gpt-4o-mini를 사용할 때 63.8%의 정확도와 3.20초의 지연(전체 컨텍스트 대비 31.3초)이며, gpt-4o를 사용할 경우 71.2%의 정확도와 2.58초의 지연(전체 컨텍스트 대비 28.9초)이다.
Zep은 베이스라인 접근법에 비해 약 90%의 지연 시간을 감소시키면서도 복잡한 질의 유형에서 더 높은 정확도를 달성한다.
시간 추론과 다중 세션 기억 작업에서 가장 큰 이점을 보이며, 기업형 시나리오에서의 강점을 보여준다.
평가에서는 벤치마크의 한계와 대화 기록의 구조화된 데이터와의 합성을 평가하는 보다 기업 친화적인 메모리 벤치마크의 필요성이 지적된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.