QUICK REVIEW

[论文解读] Towards Continual Knowledge Learning of Language Models

Joel Jang, Seonghyeon Ye|arXiv (Cornell University)|Oct 7, 2021

Topic Modeling参考文献 56被引用 40

一句话总结

论文将 Continual Knowledge Learning (CKL) 正式化为通过持续预训练来更新语言模型的世界知识，提出含 InvariantLAMA、UpdatedLAMA、NewLAMA 数据集的 CKL 基准，并分析包括正则化、回忆以及参数扩展在内的 CKL 方法，强调尽管存在内存问题，参数扩展仍然最具鲁棒性。

ABSTRACT

Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, which is often utilized for performing knowledge-dependent downstream tasks such as question answering, fact-checking, and open dialogue. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes, but it is non-trivial to avoid catastrophic forgetting and reliably acquire new knowledge while preserving invariant knowledge. To push the community towards better maintenance of ever-changing LMs, we formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL). We construct a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge. We adopt applicable recent methods from literature to create several strong baselines. Through extensive experiments, we find that CKL exhibits unique challenges that are not addressed in previous CL setups, where parameter expansion is necessary to reliably retain and learn knowledge simultaneously. By highlighting the critical causes of knowledge forgetting, we show that CKL is a challenging and important problem that helps us better understand and train ever-changing LMs. The benchmark datasets, evaluation script, and baseline code to reproduce our results are available at https://github.com/joeljang/continual-knowledge-learning.

研究动机与目标

Define Continual Knowledge Learning (CKL) as continual pretraining to refresh world knowledge while preserving time-invariant facts.
Construct a benchmark to measure retention of invariant knowledge, updating of outdated knowledge, and acquisition of new knowledge.
Propose FUAR, a metric capturing the trade-off between forgetting and acquiring/updating knowledge.
Evaluate CKL methods across architectures and identify challenges unique to CKL compared to traditional continual learning.

提出的方法

Introduce CKL formulation and three benchmark datasets: InvariantLAMA (time-invariant knowledge), UpdatedLAMA (outdated knowledge to be updated), NewLAMA (new knowledge from D1).
Construct New Text Corpus D1 (CC-RecentNews) and define a zero-shot LAMA probing framework for knowledge evaluation.
Propose FUAR metric to quantify Forgetting / (Updated + Acquired) trade-offs across sequential corpora.
Evaluate baseline CKL methods categorized as regularization, rehearsal, and parameter-expansion (e.g., RecAdam, Mix-Review, LoRA, Kadapters, Modular) on encoder-decoder models (T5).
Analyze effects of multiple CKL phases, data duplication, learning rate, and cross-architecture transfer of CKL methods.

实验结果

研究问题

RQ1How does continual pretraining on a new corpus affect retention of time-invariant knowledge in LMs?
RQ2How effectively can CKL update outdated information and acquire new knowledge without catastrophic forgetting?
RQ3Which CKL methods best balance forgetting with updating and acquiring knowledge, and how do they scale with multiple CKL phases?

主要发现

CKL methods generally reduce forgetting of invariant knowledge and improve updating/acquisition versus vanilla continued pretraining.
Parameter-expansion methods (e.g., Kadapters, Modular) achieve the best results on UpdatedLAMA and NewLAMA but suffer memory inefficiency with growing parameters.
Rehearsal methods (e.g., MixReview) underperform on UpdatedLAMA and NewLAMA due to poor updating/acquisition gains.
CKL reveals memory efficiency and repeated exposure to the same data as key factors driving forgetting; learning rate and multiple CKL phases significantly affect performance.
CKL results transfer across LM architectures, though trends vary by method and architecture details.
The study provides a baseline suite of architectures and training strategies tailored for CKL, highlighting nontrivial differences from traditional CL.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。