Skip to main content
QUICK REVIEW

[논문 리뷰] An Empirical Study of Example Forgetting during Deep Neural Network Learning

Mariya Toneva, Alessandro Sordoni|arXiv (Cornell University)|2018. 12. 12.
Domain Adaptation and Few-Shot Learning참고 문헌 37인용 수 199
한 줄 요약

이 논문은 SGD 중 개별 학습 예제에 대한 forgetting 이벤트를 정의하고 분석하여 unforgettable 및 forgettable 예제, 교차 아키텍처 안정성, 그리고 forgettable 예제를 제거하는 것이 일반화를 보존하는 경우가 많다는 것을 발견한다.

ABSTRACT

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set's (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.

연구 동기 및 목표

  • Investigate whether a forgetting phenomenon occurs within a single-task learning process similar to catastrophic forgetting.
  • Characterize the distribution and properties of forgetting events across datasets and architectures.
  • Assess whether removing forgettable or unforgettable examples impacts generalization and data efficiency.

제안 방법

  • Define forgetting events as moments when an example transitions from correct to incorrect during SGD training.
  • Compute per-example forgetting statistics as training progresses, using mini-batch updates.
  • Evaluate across MNIST, permuted MNIST, and CIFAR-10 with CNN, ResNet, and WideResNet architectures.
  • Analyze correlation between forgetting events and misclassification margin.
  • Experiment with removing subsets of data ordered by forgetting events to test data efficiency.

실험 결과

연구 질문

  • RQ1Do neural networks exhibit forgetting events for individual training examples within a single task?
  • RQ2Are some examples unforgettable across seeds and architectures, and do forgetting patterns generalize across models?
  • RQ3Can forgetting dynamics identify informative vs noisy or outlier examples, and how does removing such examples affect generalization?

주요 결과

  • Many examples are unforgettable, stable across seeds, and correlate across architectures.
  • The most forgettable examples often have noisy labels or uncommon features and are visually ambiguous.
  • Removing a large fraction of the least-forgotten examples does not harm generalization; removing the most-forgotten examples degrades less when choosing data strategically.
  • For CIFAR-10, up to 30-35% of data can be removed based on forgetting without significant performance loss.
  • Forgetful examples tend to lie near the decision boundary, behaving like data points similar to SVM support vectors.
  • Forgetting statistics remain stable across epochs and architectures, enabling transfer of forgetting orderings between models.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.