QUICK REVIEW

[논문 리뷰] Rethinking ImageNet Pre-training

Kaiming He, Ross Girshick|arXiv (Cornell University)|2018. 11. 21.

Advanced Neural Network Applications참고 문헌 39인용 수 99

한 줄 요약

이 논문은 정규화, 더 긴 학습 시간, 그리고 적절한 하이퍼파라미터 튜닝이 주어질 때 COCO에서 객체 검출 및 인스턴스 분할이 scratch 학습으로도 ImageNet에 사전 학습된 모델과 경쟁하거나 이를 능가할 수 있음을 보여준다; ImageNet 사전 학습은 주로 초기 수렴 속도를 높이고 최종 정확도에 항상 필요하지는 않다.

ABSTRACT

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the number of training iterations so the randomly initialized models may converge. Training from random initialization is surprisingly robust; our results hold even when: (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics. Experiments show that ImageNet pre-training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy. To push the envelope we demonstrate 50.9 AP on COCO object detection without using any external data---a result on par with the top COCO 2017 competition results that used ImageNet pre-training. These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning' in computer vision.

연구 동기 및 목표

ImageNet 사전 학습이 COCO의 객체 검출 및 분할에 필요한지 의문을 제기한다.
표준 베이스라인 및 하이퍼파라미터 하에서 scratch 학습이 최종 성능을 비교적 같거나 우수하게 달성될 수 있는지 평가한다.
아키텍처 및 데이터 체계 전반에서 scratch 학습을 가능하게 하는 정규화 및 학습 길이의 조정이 필요한지 식별한다.
데이터 규모(전체 COCO 대 축소 데이터)가 사전 학습의 상대적 이점에 어떤 영향을 미치는지 평가한다.

제안 방법

Mask R-CNN을 ResNet/ResNeXt 백본 및 FPN과 함께 COCO train2017에서 사용하고 val2017의 bbox 및 mask AP를 평가한다.
고정된 BatchNorm을 GroupNorm 또는 SyncBN으로 대체하여 안정적인 scratch 학습을 가능하게 한다.
scratch 모델이 수렴할 수 있도록 학습 반복을 증가시키며(6× 스케줄)
학습 시간 규모 증강 및 데이터 증강을 사용하여 데이터 체계 간 강인성을 연구한다.
scratch vs. ImageNet pre-training을 다양한 아키텍처, 데이터 스케일, 그리고 작업별 지표(bbox AP, mask AP, keypoint AP)에서 비교한다.
사전 학습 없이도 높은 AP를 달성하는 대규모 scratch 학습(X152 with GN)을 시연한다.

실험 결과

연구 질문

RQ1COCO에서 객체 검출 및 인스턴스 분할이 scratch 학습으로 ImageNet 사전 학습 모델과 동등한 성능에 도달할 수 있는가?
RQ2탐재를 가능하게 하는 필요한 정규화 기법은 무엇인가?
RQ3학습 기간이 scratch 수렴 및 최종 정확도에 사전 학습 대비 어떤 영향을 미치는가?
RQ4특히 제한된 데이터 하에서 ImageNet 사전 학습이 일반화에 이점을 제공하는지, 아니면 주로 초기 수렴 속도만 가속하는지?
RQ5scratch로 학습된 모델은 위치 민감도 지표와 키포인트 검출에서 어떻게 성능을 보이는가?

주요 결과

Model	Schedule	AP_bbox (val2017)
R50 (random init)	2×	36.8
R50 (random init)	3×	39.5
R50 (random init)	4×	40.6
R50 (random init)	5×	40.7
R50 (random init)	6×	41.3
R50 (with pre-train)	2×	40.3
R50 (with pre-train)	3×	40.8
R50 (with pre-train)	4×	40.9
R50 (with pre-train)	5×	40.9
R50 (with pre-train)	6×	41.1
R101 (random init)	2×	38.2
R101 (random init)	3×	41.0
R101 (random init)	4×	41.8
R101 (random init)	5×	42.2
R101 (random init)	6×	42.7
R101 (with pre-train)	2×	41.8
R101 (with pre-train)	3×	42.3
R101 (with pre-train)	4×	42.3
R101 (with pre-train)	5×	41.9
R101 (with pre-train)	6×	42.2

GN/SyncBN을 사용하고 확장된 학습 스케줄을 적용할 때 COCO에서 scratch 학습이 다수의 베이스라인에 대해 ImageNet 사전 학습 모델의 정확도와 같거나 더 높게 나올 수 있다.
ImageNet 사전 학습은 초기 수렴 속도를 높이지만 표준 스케줄 하에서 최종 목표 작업 정확도를 반드시 개선하지는 않으며, 더 긴 학습(5×–6×)을 사용할 경우 scratch 모델이 유사하거나 더 나은 AP에 도달한다.
Scratch 학습은 COCO 데이터의 10%만 사용해도 경쟁력을 유지하며, 더 큰 백본(X152)에서 scratch 학습은 val2017에서 대략 50.9 bbox AP 및 43.2 mask AP까지 도달할 수 있다.
위치 민감도 지표 및 키포인트 검출은 ImageNet 사전 학습의 이점을 보이는 경우가 적거나 없을 때가 많으며, 중첩 임계값 및 키포인트 작업에서 scratch 모델이 더 잘 작동하는 경우가 있다.
데이터 스케일 전반에 걸쳐 적절한 정규화와 더 긴 최적화가 중요하다; 데이터가 풍부하거나 작업이 분류보다 위치 추정에 중점을 두는 경우 사전 학습의 도움은 적다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.