QUICK REVIEW

[論文レビュー] A Survey of Label-noise Representation Learning: Past, Present and Future

Bo Han, Quanming Yao|arXiv (Cornell University)|Nov 9, 2020

Machine Learning and Data Classification参考文献 149被引用数 101

ひとこと要約

ラベルノイズ表現学習（LNRL）を定義する包括的な総説で、理論、分類法、およびノイズ付きラベル下で深層モデルを頑健に訓練する方法を調査し、今後の方向性を概説する。

ABSTRACT

Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.

研究の動機と目的

ラベルノイズ表現学習（LNRL）とその範囲を定義する。
学習理論と実証的視点を通じて、ノイズ付きラベルが深層モデルへ与える影響を説明する。
データ、目的、最適化に基づく LNRL 手法の統一的分類を提供する。
ノイズ遷移行矩、損失補正、最適化のコツを用いる既存手法を調査する。
ラベルノイズを超えた LNRL の将来の研究方向とデータセットを提案する。

提案手法

トレーニングラベルが汚染された一般的な問題設定で LNRL を形式化する。
データ、目的、最適化の観点からの理論的基盤を調査する。
データ（ノイズ遷移行矩）、目的（ノイズ耐性損失）、最適化（記憶化ベースの戦略）に沿った方法の統一分類を構築する。
アンカーポイント、遷移行矩、損失補正をコアツールとして議論する。
記憶化効果と早期停止を最適化の指針として強調する。
今後の方向性と LNRL を超える潜在的方向性を概説する。

実験結果

リサーチクエスチョン

RQ1ラベルノイズ表現学習（LNRL）の正式な定義と範囲は何か？
RQ2学習理論と実証的視点から、なぜノイズ付きラベルが深層モデルに影響を与えるのか？
RQ3LNRL 手法はどのように分類でき、各カテゴリの長所と短所は何か？
RQ4頑健な LNRL の必須要素と将来の方向性、データセットや adversarial 設定を含めて？

主な発見

LNRL はデータ、目的、最適化を統合してノイズ付きラベルで頑健に学習する。
ラベルノイズ遷移行矩を推定・活用することが多くのアプローチの中核である。
ノイズ耐性のある損失と分類器整合推定量はノイズのある分布とクリーンな分布の橋渡しに役立つ。
memorization 効果と early stopping を活用した最適化方針は頑健性を高める。
統一された分類は異なる LNRL 戦略の長所とトレードオフを明確にする。
将来の方向性にはインスタンス依存ノイズ、敵対的 LNRL、さまざまなノイズモダリティでの学習が含まれる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。