QUICK REVIEW

[論文レビュー] Incremental Self-training for Semi-supervised Learning

Jifeng Guo, Zhulin Liu|arXiv (Cornell University)|Apr 14, 2024

Domain Adaptation and Few-Shot Learning被引用数 5

ひとこと要約

Incremental Self-training (IST) は、ラベルなしデータをクラスタと連続的なバッチで処理し、高信頼性のサンプルを優先することで精度と学習速度を向上させる。画像分類タスクにおいて、ベースラインおよび一部の最先端手法を上回る。

ABSTRACT

Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.

研究の動機と目的

SSLにおけるラベル付きデータ依存の低減を促し、ラベルなしデータを効率的に活用する動機付け。
クラスタリングによる確信度でラベルなしデータを識別するISTを提案する。
収束を加速するためのバッチ単位・逐次データ処理を可能にする。
既存の自己学習バックボーンとデータセットへのISTの適用性を示す。

提案手法

初期化時にラベルなしデータをクラスタリングして確信度ベースのクエリリストを作成する。
高信頼度サンプルから優先的に疑似ラベルを割り当てる。
分類器を更新するためにラベルなしデータを逐次バッチで処理する。
性能と速度への影響を調べるために複数のクラスタリング法を用いる。
ISTの汎用性を示すために反復的および非反復的バックボーンを比較する。

実験結果

リサーチクエスチョン

RQ1インクリメンタルでクラスタリング済み・バッチ単位の疑似ラベル付与は、標準的な自己学習よりもSSLの精度と収束速度を改善することができるか。
RQ2クラスタリング手法の選択はISTの性能と学習時間にどう影響するか。
RQ3ISTは反復的・非反復的バックボーン設定の堅牢性を維持または向上させるか。

主な発見

方法	クラスタとリスト	精度(%)	時間(秒)
ST		89.30	57321.65
IST	w/ K-Means	93.17	44796.71
IST	w/ MiniBMean	93.76	44076.97
IST	w/ Meanshift	94.25	157669.75
ST		86.87	156.63
IST	w/ BIRCH	88.97	99.40
IST	w/ K-Means	90.60	97.85
IST	w/ MeanShift	93.28	91.94

ISTは、検証設定全体で標準的な自己学習より平均精度を6.41%向上させる。
難易度の高い画像データセットで最先端手法に対して4%の改善を達成する。
クラスタ間で標準的な自己学習と比較して学習時間を約40～50%短縮する。
異なるクラスタリング手法は精度と時間のトレードオフを生み出す。MeanShiftは精度を高める可能性があるが、クラスタリング時間が劇的に増加することがある。
ISTは精度と収束を効果的に改善し、特定のバックボーンでSTに見られる一部の精度低下を回避できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。