QUICK REVIEW

[논문 리뷰] Overfitting Mechanism and Avoidance in Deep Neural Networks

Shaeke Salman, Xiuwen Liu|arXiv (Cornell University)|2019. 01. 19.

Neural Networks and Applications참고 문헌 19인용 수 107

한 줄 요약

이 논문은 연속적인 기울기 업데이트와 소프트맥스 입력 스케일링에 의해 주도되는 딥 뉴럴 네트워크의 과적합을 분석하고, 다중 모델을 이용한 합의 기반 분류 알고리즘을 제안하여 애매하게 분류된 샘플을 식별하고 거부함으로써 작은 학습 데이터로도 정확도를 향상시킨다.

ABSTRACT

Assisted by the availability of data and high performance computing, deep learning techniques have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including object recognition, speech recognition, and natural language processing. As they are being used in critical applications, understanding underlying mechanisms for their successes and limitations is imperative. In this paper, we show that overfitting, one of the fundamental issues in deep neural networks, is due to continuous gradient updating and scale sensitiveness of cross entropy loss. By separating samples into correctly and incorrectly classified ones, we show that they behave very differently, where the loss decreases in the correct ones and increases in the incorrect ones. Furthermore, by analyzing dynamics during training, we propose a consensus-based classification algorithm that enables us to avoid overfitting and significantly improve the classification accuracy especially when the number of training samples is limited. As each trained neural network depends on extrinsic factors such as initial values as well as training data, requiring consensus among multiple models reduces extrinsic factors substantially; for statistically independent models, the reduction is exponential. Compared to ensemble algorithms, the proposed algorithm avoids overgeneralization by not classifying ambiguous inputs. Systematic experimental results demonstrate the effectiveness of the proposed algorithm. For example, using only 1000 training samples from MNIST dataset, the proposed algorithm achieves 95% accuracy, significantly higher than any of the individual models, with 90% of the test samples classified.

연구 동기 및 목표

딥 뉴럴 네트워크에서 데이터 양 이외의 요인으로 과적합이 어떻게 발생하는지 설명한다.
연속적인 그래디언트 업데이트와 소프트맥스 입력의 스케일링이 검증 손실 증가를 유발한다는 것을 보여준다.
모호한 샘플을 거부하여 과적합을 피하기 위한 합의 기반 분류 알고리즘을 제안한다.
여러 모델 간의 합의가 외재적 요인들을 줄이고, 특히 작은 학습 데이터에서 내재적 정확도를 개선하는지 보여준다.

제안 방법

MNIST에서 학습된 네트워크 해의 보간(interpolation)을 통한 좋은 해의 풍부성에 대한 실증적 분석.
학습 다이나믹스를 관찰하고 분석한 결과, 소프트맥스 입력의 스케일링 효과로 인해 검증 손실이 증가하는 반면 학습 손실은 감소한다는 것을 보인다.
다수 모델의 확률을 이용해 분류 여부를 결정하거나 애매한 샘플을 거부하는 합의 기반 분류 알고리즘(Algorithm 1)을 개발한다.
다양한 아키텍처와 데이터셋에 대해 내재적(일관되게 분류된) 대 외재적(무작위 요인) 분류를 평가한다.
내재적 정확도와 CCS 샘플 비율에 대한 임계치 p_t의 효과를 평가한다.
단일 모델 성능과 드롭아웃이 CCS 결과에 미치는 영향을 비교한다.

실험 결과

연구 질문

RQ1과다 매개변수화와 풍부한 좋은 해에도 불구하고 왜 딥 뉴럴 네트워크에서 과적합이 발생하는가?
RQ2다양한 모델 간의 합의 기반 접근법이 과도한 일반화를 식별하고 거부하여 제한된 데이터로 일반화를 개선할 수 있는가?
RQ3소프트맥스 입력 스케일링과 교차 엔트로피 손실과의 관계에서 올바르게 분류된 샘플과 잘못 분류된 샘플의 학습 다이나믹스는 어떻게 달라지는가?
RQ4다양한 아키텍처와 정규화(드롭아웃)가 내재적 분류 정확도에 어떤 영향을 미치는가?

주요 결과

계속적인 그래디언트 업데이트가 소프트맥스 입력의 크기를 증가시켜 과적합이 나타날 수 있으며, 이로 인해 학습 손실은 감소하는 반면 검증 손실은 증가한다.
잘못 분류된 샘플이 검증 손실의 증가를 주도하는 반면, 올바르게 분류된 샘플의 손실은 감소한다.
다수 모델을 이용한 합의 기반 분류 방식은 일관되게 분류된 샘플은 분류하고 모호한 샘플은 거부하도록 하여, 특히 작은 학습 세트에서 내재적 정확도를 향상시킨다.
임계치 파라미터 p_t를 사용하면 단일 모델 대비 내재적 정확도와 CCS(일관되게 분류된 샘플)의 비율이 증가한다.
데이터가 제한된 상황(예: MNIST에서 1000개의 학습 샘플)에서 상당한 정확도 향상을 보이며 아키텍처에 대해 강건성을 보인다.
드롭아웃과 앙상블과 같은 다이나믹이 CCS에 영향을 주지만, 합의 기반 방법은 다양한 정규화에서도 단일 모델보다 더 나은 성능을 보일 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.