QUICK REVIEW

[論文レビュー] Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks

Haoran You, Chaojian Li|arXiv (Cornell University)|Sep 26, 2019

Advanced Neural Network Applications参考文献 42被引用数 50

ひとこと要約

本論文は、Early-Bird ticketsを非常に早いトレーニング段階で識別できることを示し、低コストのトレーニングとマスク距離指標を用いて、EB Trainを導入して同等以上の精度を維持しつつ大幅なエネルギー節約を実現します。

ABSTRACT

(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training. Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy. Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets, and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 4.7x energy savings while maintaining comparable or even better accuracy, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training. Code available at https://github.com/RICE-EIC/Early-Bird-Tickets.

研究の動機と目的

モデルとデータセット全体で、トレーニングの早い段階で出現するEarly-Bird (EB) ticketsの存在を示す。
EB ticketsが低コストの訓練方式と実用的なマスク距離指標で識別可能であることを示す。
EB ticketsを活用して訓練エネルギーとFLOPsを削減しつつ精度を維持する訓練フレームワーク EB Train を開発する。
CIFARおよびImageNetで最先端の剪定ベース訓練法と比較してEB Trainを評価する。
高い学習率と低精度訓練がEB ticketの出現に与える影響について洞察を提供する。

提案手法

EB ticketsを、t << i の時点で剪定マスクから訓練された場合に、密なモデルと同等またはそれを上回る精度を示すサブネットワークとして定義する。
BNのスケーリング因子と2値マスクmに基づくカラムごとの剪定を用いて、初期段階で密なネットワークを剪定する。
EB tickets出現を検出するため、チケットマスク間のハミング距離などのマスク距離指標を導入する。
最近の距離が閾値ε（例: 0.1）を下回ったときにEBチケット識別をトリガーするため、マスク距離のFIFOキューを使用する。
EB Trainを実装: (a) 低コストの訓練（高学習率および探索時の8ビット精度を含む）でEB ticketsを探索、(b) 目標精度に向けてEB ticketsのみを再訓練。
EB Trainのバリアントを比較: FF（全精度の探索/再訓練）、再初期化、LF（低精度探索/全精度再訓練）、LL（低精度探索/再訓練）。
CIFAR-10/100、ImageNetでPreResNet101、VGG16、ResNet18/50に対するエネルギーとFLOPsの節約を実証する。
初期化へリワインドする代わりにEB ticketsから重みを継承する、観察された利点に従う。）

実験結果

リサーチクエスチョン

RQ1EB ticketsは人気モデルとデータセット全体で一貫して存在するのか？
RQ2完全な訓練を行わずとも低コスト訓練とマスク距離指標でEB ticketsを信頼性高く識別できるのか？
RQ3EB Trainだけを訓練して従来の剪定/再訓練と比べて、精度・エネルギー/FLOPsの大幅な節約を実現するのか？
RQ4高い学習率と低精度訓練はEB ticketsの出現と有用性にどのように影響するのか？
RQ5ImageNetのような大規模データセットとResNet系のアーキテクチャに対して、EB Trainは最先端のベースラインと比べてどのように機能するのか？

主な発見

EB ticketsは非常に初期段階（160エポック中で20エポック目程度）に一貫して出現し、完全訓練時に得られるチケットよりも優れていることがある。
連続するエポックのチケット間のマスク距離は初期段階で安定化する；窓内の最大距離がε（0.1）を下回るとEBチケットが識別される。
EB TrainはCIFARでPreResNet101/VGG16を用いた場合、基準と比較して2.2–2.4xのFLOPs削減を達成し、精度は同等かそれ以上。
低精度探索/再訓練（FP8/8-bit）を含むEB Trainは、多くの設定で5.8–24.6xのエネルギー節約と1.1–5.0xのFLOPs節約を達成しつつ、精度を維持または向上。
ImageNet（ResNet18/50）では、EB Trainは訓練FLOPsを51.5–74.0%削減し、訓練エネルギーを46.5–70.9%削減、いくつかの構成で最大+2.34%の精度向上。
初期化の受け継ぎ（再初期化 vs. EBチケット重みの継承）は再訓練性能を改善し、EBチケットアプローチをランダム再初期化より優位にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。