QUICK REVIEW

[논문 리뷰] Needle in the Haystack for Memory Based Large Language Models

Elliot Nelson, Γεώργιος Κόλλιας|arXiv (Cornell University)|2024. 07. 01.

Topic Modeling인용 수 5

한 줄 요약

요약: 이 논문은 외부 CPU에 저장된 메모리를 갖춘 1.3B 파라미터의 Larimar 모델이 긴 맥락 재현을 가능하게 하며(100K+ 토큰까지), task-specific training 없이도 패스키와 건초 더미 태스크에서 기준 모델을 능가한다.

ABSTRACT

Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts.

연구 동기 및 목표

LLM의 긴 맥락 검색 개선의 필요성 제기.
Larimar와 연동된 외부 메모리 메커니즘을 테스트 시점의 긴 맥락 적응에 대해 제안·평가.
메모리 읽기/쓰기 연산이 GPU 메모리 사용량을 늘리지 않으면서 매우 긴 맥락으로 확장될 수 있음을 시연.
메모리 읽기가 디코더를 조건화하여 올바른 출력을 생성하도록 하는지 확인.
긴 맥락 작업을 위한 CPU 기반 외부 메모리의 실용성과 한계점에 대해 논의.

제안 방법

least-squares 메모리 업데이트로 작성된 외부 연관 메모리와 함께 Larimar 아키텍처를 사용.
인코딩과 쓰기 키를 사용하여 컨텍스트 세그먼트를 메모리에 기록.
접두사나 질의 인코딩에서 도출된 읽기 키를 사용하여 메모리에서 읽기.
메모리 읽기를 z_read = w M 로 계산하고 이 읽기를 디코더에 조건화.
가장 가까운 이웃 매핑이 쓰기와 읽기를 연결하도록 고정 키 메모리를 통해 키를 제어.
디코딩은 GPU에서 수행하는 동안 메모리를 CPU에서 작동시켜 맥락과 함께 메모리의 크기를 확장.

실험 결과

연구 질문

RQ1외부에 보관되고 동적으로 업데이트되는 메모리가 특정 작업 학습 없이도 LLM의 긴 맥락 재현을 개선할 수 있는가?
RQ2 Prefix 기반의 키 계산이 매우 긴 맥락(100K–1M 토큰)에서 신뢰할 수 있는 검색을 가능하게 하는가?
RQ3 긴 맥락 작업을 위해 메모리를 CPU로 오프로드할 때 메모리 크기, 지연, GPU 사용의 트레이드오프는 무엇인가?
RQ4 Larimar가 패스키 및 건초 더미 태스크에서 기준 긴 맥락 검색 모델과 비교하여 어떤 차이가 있는가?

주요 결과

Context	3 digits	4 digits	SF
Larimar 137K	0.95	0.64	1.0
Larimar (no prefix) 137K	0.88	0.14	0.0
Mistral 7B v0.2 24K	0.66	0.62	0.80
Phi-3-mini-128K 100K	0.27	0.26	0.37

Larimar는 task-specific training 없이도 1.3B 파라미터 모델로 100K 토큰이 넘는 긴 맥락에 대해 강한 재현을 유지한다.
메모리 읽기는 전체 맥락이 GPU 밖에 저장되어 있을 때도 디코더를 올바른 출력으로 이끈다.
비교 대상(Baselines: Mistral 7B, Phi-3-Mini-128K)과 비교했을 때 Larimar는 같은 크기 이하의 모델에서 패스키와 건초 더미 태스크에서 더 우수한 재현을 보인다.
키를 작성하기 위한 접두사 기반 접근 방식이 더 길고 복잡한 니들에 대해 재현을 향상시킨다.
CPU 기반 외부 메모리는 GPU 메모리 사용량을 늘리지 않으면서 긴 맥락으로 확장 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.