QUICK REVIEW

[논문 리뷰] Towards Continual Knowledge Learning of Language Models

Joel Jang, Seonghyeon Ye|arXiv (Cornell University)|2021. 10. 07.

Topic Modeling참고 문헌 56인용 수 40

한 줄 요약

이 논문은 Continual Knowledge Learning(CKL)을 정의하여 LMs의 내부 세계 지식을 지속적 사전학습으로 새롭게 갱신하고, InvariantLAMA, UpdatedLAMA, NewLAMA 데이터셋으로 CKL 벤치마크를 도입하며, 정규화, 리허설, 파라미터 확장을 포함한 CKL 방법을 분석하고, 파라미터 확장이 기억 이슈에도 불구하고 가장 강건함을 보임을 강조한다.

ABSTRACT

Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, which is often utilized for performing knowledge-dependent downstream tasks such as question answering, fact-checking, and open dialogue. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes, but it is non-trivial to avoid catastrophic forgetting and reliably acquire new knowledge while preserving invariant knowledge. To push the community towards better maintenance of ever-changing LMs, we formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL). We construct a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge. We adopt applicable recent methods from literature to create several strong baselines. Through extensive experiments, we find that CKL exhibits unique challenges that are not addressed in previous CL setups, where parameter expansion is necessary to reliably retain and learn knowledge simultaneously. By highlighting the critical causes of knowledge forgetting, we show that CKL is a challenging and important problem that helps us better understand and train ever-changing LMs. The benchmark datasets, evaluation script, and baseline code to reproduce our results are available at https://github.com/joeljang/continual-knowledge-learning.

연구 동기 및 목표

CKL을 시간에 불변하는 사실을 보존하면서 세계 지식을 갱신하기 위한 지속적 사전학습으로 정의한다.
불변 지식의 보유, 구식 지식의 업데이트, 새로운 지식의 습득을 측정하는 벤치마크를 구성한다.
잊어버림과 지식의 업데이트/획득 간의 균형을 포착하는 지표 FUAR를 제안한다.
CKL 방법을 아키텍처별로 평가하고 전통적 CL과 비교했을 때 CKL이 직면한 고유한 도전 과제를 파악한다.

제안 방법

CKL 공식화와 세 벤치마크 데이터세트 도입: InvariantLAMA(시간 불변 지식), UpdatedLAMA(업데이트할 구식 지식), NewLAMA(D1에서의 새로운 지식).
새 텍스트 코퍼스 D1(CC-최근 뉴스) 구축 및 지식 평가를 위한 제로샷 LAMA 프로빙 프레임워크 정의.
잊어버림/업데이트+획득 간의 trade-off를 측정하기 위한 FUAR 지표 제안.
정규화, 리허설, 파라미터 확장(예: RecAdam, Mix-Review, LoRA, Kadapters, Modular)으로 분류된 기본 CKL 방법을 인코더-디코더 모델(T5)에서 평가.
여러 CKL 단계, 데이터 중복, 학습률, CKL 방법의 크로스-아키텍처 전이의 효과를 분석.

실험 결과

연구 질문

RQ1새로운 코퍼스로의 지속적 사전학습이 LMs의 시간 불변 지식 보유에 어떤 영향을 미치는가?
RQ2CKL이 구식 정보의 업데이트와 새로운 지식의 습득을 카타고리적 망각 없이 얼마나 효과적으로 수행하는가?
RQ3어떤 CKL 방법이 잊어버림과 업데이트/획득의 균형을 가장 잘 맞추며, 다중 CKL 단계에서 어떻게 확장되는가?

주요 결과

방법	학습 가능한/전체 매개변수 수	IL	UL	NL	NLE	FUAR
T5-Initial	0M / 737M	24.17	1.62	1.88	10.32	-
T5-Vanilla	737M / 737M	12.89	10.17	3.77	17.75	1.08
T5-RecAdam	737M / 737M	13.20	12.55	4.02	17.85	0.84
T5-MixReview	737M / 737M	13.92	6.49	2.89	14.86	1.74
T5-LoRA	403M / 738M	16.58	12.77	4.52	19.56	0.55
T5-Kadapters (k=2)	427M / 762M	19.59	12.34	5.03	18.75	0.33
T5-Kadapters (k=3)	440M / 775M	19.76	12.66	4.02	19.00	0.33
T5-Modular	438M / 773M	20.29	12.66	4.65	19.24	0.28

CKL 방법은 일반적으로 불변 지식의 망각을 줄이고 기존의 계속 학습 대비 업데이트/획득을 향상시킨다.
파라미터 확장 방법(예: Kadapters, Modular)은 UpdatedLAMA와 NewLAMA에서 최상의 성능을 달성하지만 매개변수가 증가함에 따라 메모리 효율이 떨어진다.
리허설 방법(예: MixReview)은 UpdatedLAMA와 NewLAMA에서 업데이트/획득 이익이 부족해 성능이 떨어진다.
CKL은 메모리 효율성과 동일 데이터에 반복 노출이 망각의 핵심 요인임을 보여주고, 학습률과 다중 CKL 단계가 성능에 큰 영향을 준다.
CKL 결과는 LM 아키텍처 간에 전이되며, 다만 방법 및 아키텍처 세부사항에 따라 경향이 다르다.
이 연구는 CKL에 맞춘 기본 아키텍처 및 학습 전략의 표준 세트를 제공하며, 전통적 CL과의 비trivial한 차이점을 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.