QUICK REVIEW

[논문 리뷰] On the importance of single directions for generalization

Ari S. Morcos, David G. T. Barrett|arXiv (Cornell University)|2018. 03. 19.

Advanced Vision and Imaging참고 문헌 19인용 수 195

한 줄 요약

기억화 네트워크는 단일 활성화 방향에 의존하는 경향이 더 크며; 일반화는 단일 방향 의존성 축소와 상관관계가 있다. 배치 정규화는 이 의존성을 줄이고, 클래스 선택성은 단위 중요성의 좋은 예측 변수가 아니다.

ABSTRACT

Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some input) have been highlighted, but their importance has not been evaluated. Here, we connect these lines of inquiry to demonstrate that a network's reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodified labels, across different hyperparameters, and over the course of training. While dropout only regularizes this quantity up to a point, batch normalization implicitly discourages single direction reliance, in part by decreasing the class selectivity of individual units. Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.

연구 동기 및 목표

네트워크의 일반화 성능이 활성화 공간에서 단일 방향 의존성과 관련이 있는지 조사한다.
단일 방향(ablations)을 억제하는 방식으로 perturbation이 다양한 라벨 손상 및 아키텍처를 가진 네트워크에 어떤 영향을 미치는지 조사한다.
드롭아웃과 배치 정규화 같은 규제기가 단일 방향 의존성에 어떤 영향을 미치는지 평가한다.
단일 방향의 클래스 선택성이 네트워크 출력에 대한 중요성을 예측하는지 평가한다.

제안 방법

단일 방향을 입력에 반응하여 개별 단위의 활성화나 선형 조합으로 정의한다.
활성화 공간 인 ablation을 수행하기 위해 선택된 방향을 0으로 클램프하고 방향 하위 집합에 걸친 성능 저하를 측정한다.
단위에 가우시안 노이즈를 추가하여 임의 방향에 대한 의존성을 테스트하고, 노이즈는 단위 활성화 분산으로 스케일링한다.
뉴런은 뇌과학에서 영감을 얻은 클래스 선택도 지수를 사용하여 단위가 클래스에 얼마나 선택적으로 반응하는지 정량화한다.
레이블이 손상된 데이터셋과 손상되지 않은 데이터셋에서 학습된 네트워크를 아키텍처 간에 비교한다( MNIST의 MLP, CIFAR-10의 CNN, ImageNet의 ResNet ).
배치 정규화와 드롭아웃이 단일 방향 의존성 및 클래스 선택성에 미치는 영향을 분석한다.

Figure 1: Memorizing networks are more sensitive to cumulative ablations. Networks were trained on MNIST (2-hidden layer MLP, a ), CIFAR-10 (11-layer convolutional network, b ), and ImageNet (50-layer ResNet, c ). In a , all units in all layers were ablated, while in b and c , only feature maps in t

실험 결과

연구 질문

RQ1기억화가 구조 학습 일반화에 비해 네트워크의 단일 활성화 방향 의존성을 증가시키는가?
RQ2단일 방향 의존성이 일반화, 조기 중지 또는 하이퍼파라미터 선택의 대리 변수로 사용될 수 있는가?
RQ3드롭아웃과 배치 정규화가 단일 방향 의존성과 단위의 클래스 선택성에 어떤 영향을 미치는가?
RQ4클래스 선택성은 네트워크 출력에 대한 단위의 중요성을 신뢰할 수 있는 예측 변수인가?

주요 결과

기억화 네트워크는 일반화가 잘 되는 네트워크보다 단일 방향의 누적 반입에 더 민감하다.
일반화가 더 잘 되는 네트워크는 단일 방향 의존성이 덜하며, 이 관계는 아키텍처와 손상 여부에 관계없이 지속된다.
배치 정규화는 단일 방향 의존성을 감소시키고 개별 특징 맵의 클래스 선택성을 낮추는 한편, 상호정보를 증가시킨다.
드롭아웃은 기억화를 지연시키지만, 훈련 중 드롭아웃 비율 이상으로 단일 방향에 대한 의존성을 완전히 차단하지는 못한다.
단일 방향의 클래스 선택성은 네트워크 출력에 대한 중요성의 좋은 예측 변수가 아니며, 고도로 선택적인 단위가 항상 더 큰 영향을 미치지는 않는다.

Figure 2: Memorizing networks are more sensitive to random noise. Networks were trained on MNIST (2-hidden layer MLP, a ), and CIFAR-10 (11-layer convolutional network, b ). Noise was scaled by the empirical variance of each unit on the training set. Error bars represent standard deviation across 10

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.