QUICK REVIEW

[논문 리뷰] Representational Continuity for Unsupervised Continual Learning

Divyam Madaan, Jaehong Yoon|arXiv (Cornell University)|2021. 10. 13.

Domain Adaptation and Few-Shot Learning인용 수 28

한 줄 요약

본 논문은 unsupervised continual learning (UCL)이 supervised continual learning보다 더 강건한 표현을 만들고 망각을 덜하는 것을 보여주며, UCL의 망각을 추가로 완화하기 위한 간단한 mixup 기반 기법 Lump를 도입한다.

ABSTRACT

Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (LUMP), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations.

연구 동기 및 목표

real-world, 무주석 데이터 스트림에서 supervised continual learning의 확장 가능한 대안으로 unsupervised continual learning을 동기 부여한다.
순차적 작업 설정에서 비지도 표현이 어떻게 동작하는지 체계적으로 분석하고, 망각에 대한 강건성이 더 클 수 있는 이유를 설명한다.
UCL 표현의 out-of-distribution 작업 및 few-shot 시나리오로의 일반화와 전이 가능성을 평가한다.
추가 하이퍼파라미터나 기존 방법에 대한 중대한 수정 없이 망각을 완화하기 위한 간단하고 효과적인 기법(Lump)을 제안한다.

제안 방법

SimSiam 및 BarlowTwin 자기지도 학습 목표를 UCL 설정으로 확장하고 Finetune 및 unsupervised 학습에 적합하게 DER 스타일의 베이스라인을 연구한다.
현재 작업 인스턴스와 과거 재생 버퍼 인스턴스 간의 보간을 통해 망각을 줄이는 Lifelong Unsupervised Mixup (Lump)을 제안한다.
고정된 ResNet-18 백본과 KNN 평가를 사용하여 Split CIFAR-10, CIFAR-100, Tiny-ImageNet에서 L CL과 감독적 연속 학습 베이스라인(정규화, 아키텍처, 및 리허설 기반)을 비교한다.
CKA(센터드 커널 정렬) 및 파라미터 공간 거리 분석을 통해 UCL과 SCL 간의 강건성과 손실 지형 차이를 이해한다.
UCL-DER로 비지도 적응을 제공하여 재생 버퍼 예제를 사용해 표현의 트레이JECTORY를 규제한다.

실험 결과

연구 질문

RQ1비지도 지속 학습이 표준 CL 벤치마크 전반에서 감독 지속 학습보다 더 큰 망각에 대해 더 강건한 표현을 생성하는가?
RQ2UCL 표현이 SCL에 비해 out-of-distribution 작업 및 few-shot 시나리오로 얼마나 잘 전이되는가?
RQ3레이블이 없는 상태에서 간단한 리허설 기반 전략을 UCL에 대해 향상시킬 수 있으며, mixup 기반 보간이 망각에 얼마나 기여하는가?
RQ4특징 유사도(CKA) 및 손실 지형 분석은 UCL 대 SCL에서 학습된 표현의 본질에 대해 무엇을 보여주는가?
RQ5Lump가 다수의 데이터셋과 작업에서 UCL의 망각을 효과적으로 완화하는가?

주요 결과

Unsupervised 표현은 Split CIFAR-10, CIFAR-100, 및 Tiny-ImageNet 전반에서 감독 표현에 비해 망각이 더 낮고 정확도가 경쟁력 있거나 더 높다.
Finetune with UCL은 종종 많은 SCL 전략보다 우수하며, Lump는 추가 이득을 제공한다(예: CIFAR-100에서 2.8% 정확도 향상, Tiny-ImageNet에서 5.9% 향상, 특정 설정에서).
BarlowTwins 및 SimSiam 기반의 UCL 표현은 SCL 베이스라인보다 망각이 현저히 낮다.
CKA 분석은 UCL 모델이 하위 층에서 높은 특징 유사성을 보이고, UCL과 SCL 표현은 상위 층에서 주로 다르며 UCL이 더 인간 지각 특징을 학습하는 경향이 있음을 시사한다.
UCL은 SCL보다 더 완만하고 매끄러운 손실 지형을 만들어 최적화 안정성과 일반화가 더 큼을 시사한다.
Lump는 현재 작업 인스턴스와 재생 버퍼 간의 간단한 mixup 기반 보간으로, 추가 하이퍼파라미터 없이도 망각을 효과적으로 완화하고 다수의 데이터셋에서 여러 베이스라인보다 성능이 우수하다.
UCL 표현은 out-of-distribution 데이터셋(MNIST, FMNIST, SVHN)으로 더 잘 일반화하고, few-shot 시나리오에서 이점이 있으며, Lump가 강한 성능을 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.