QUICK REVIEW

[논문 리뷰] Divide and not forget: Ensemble of selectively trained experts in Continual Learning

Grzegorz Rypeść, Sebastian Cygert|arXiv (Cornell University)|2024. 01. 18.

Domain Adaptation and Few-Shot Learning인용 수 11

한 줄 요약

SEED는 예시(exemplar) 없이 지속적 학습을 수행하는 방법으로, 여러 전문가를 앙상블하고 새로운 작업마다 단 한 명의 전문가만 미세조정하며, 가우시안 클래스 표현을 사용해 최적의 전문가를 선택하고 태스크-무관 및 태스크-의존 설정에서 앙상블 예측을 수행합니다.

ABSTRACT

Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.

연구 동기 및 목표

예시 없이 클래스 점증 학습(CIL)을 동기부여하여 망각을 줄이면서 가소성을 유지합니다.
한 작업당 하나의 전문가만 미세조정하는 고정 전문가 앙상블 SEED를 제안하여 망각을 최소화합니다.
잠재 공간에서 각 전문가의 클래스들을 다변량 가우시안으로 표현하여 전문가 선택 및 추론을 가능하게 합니다.
다양성의 증진을 통해 분포 시프트 및 태스크 간 성능을 개선합니다.

제안 방법

SEED는 공유된 초기 층 f를 가진 K개의 심층 신경망 전문가 g_k o f를 사용합니다. f는 첫 번째 작업 후에 고정됩니다.
각 전문가마다 잠재 공간의 클래스 c에 대해 가우시안 G_k^c = (μ_k^c, Σ_k^c)를 가집니다.
추론은 각 전문가의 클래스 가우시안 아래 잠재 표현의 로그 가능도를 계산하고, 예측을 위해 전문가들 간의 소프트맥스된 로그 가능도를 평균합니다.
훈련 중 새로운 작업 t에 대해, 잠재 클래스 분포가 가장 덜 겹치는(대칭 KL 발산으로) 전문가를 선택하고, 해당 전문가만 교차 엔트로피 손실과 피처 증류(L_KD)를 포함한 손실로 미세조정합니다.
전문가 선택은 작업의 클래스 집합 내에서 클래스 간 분포 간 거리를 최대화하는 KL 기반 기준을 사용합니다.
전체 SEED 파이프라인은 (i) 각 전문가의 잠재 공간에서 클래스별 가우시안 분포를 계산, (ii) 새로운 작업에 대해 미세조정할 최적 전문가를 선택, (iii) 선택된 전문가의 가우시안 분포를 업데이트, (iv) 교차 작업 드리프트를 방지하기 위해 첫 작업 이후 f를 고정하는 것으로 구성됩니다.

Figure 1: Exemplar-free Class Incremental Learning methods evaluated on CIFAR100 divided into eleven tasks for two different data distributions.

실험 결과

연구 질문

RQ1예시 없이 CIL 방법이 작업당 단일 전문가를 선택적으로 학습하여 최첨단 정확도를 달성할 수 있는가?
RQ2고정된 전문가 앙상블 간의 다양성 강제가 다양한 작업 분할 및 도메인 시프트에서 안정성-가소성 트레이드오프를 개선하는가?
RQ3각 전문가 내의 가우시안 기반 클래스 표현이 전문가 선택 및 작업 간 강건한 추론에 어떻게 도움을 주는가?
RQ4공유 피처 층과 전문가 수가 성능 및 매개변수 효율성에 미치는 영향은 무엇인가?

주요 결과

SEED는 다수의 벤치마크와 태스크 분할에서 예시-free CIL 방법들 가운데 최첨단 정확도를 달성합니다.
동일 분할의 작업 시나리오와 도메인 시프트(DomainNet)에서 경쟁자들을 크게 능가합니다.
공유 층을 갖고 선택적으로 미세조정하는 5-전문가 SEED 구성은 상대적으로 적은 매개변수로도 강한 성능을 보입니다.
다변량 가우시안 표현과 KL 기반 전문가 선택이 SEED 성능의 핵심이며, 전체 설계가 최상의 결과를 낳습니다.
다양성은 자연스럽게 나타나며, 각 전문가는 서로 다른 태스크에 특화되고 앙상블은 최상의 단일 전문가를 지속적으로 능가합니다.

Figure 2: SEED comprises $K$ deep network experts $g_{k}\circ f$ (here $K=2$ ), sharing the initial layers $f$ for higher computational performance. $f$ are frozen after the first task. Each expert contains one Gaussian distribution per class $c\in C$ in his unique latent space. In this example, we

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.