QUICK REVIEW

[論文レビュー] A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

Sheng Zhou, Hongjia Xu|arXiv (Cornell University)|Jun 15, 2022

Text and Document Classification Technologies被引用数 56

ひとこと要約

この調査は、深層表現学習とクラスタリングを同時に最適化する深層クラスタリング手法の新たな分類法を提供し、データセットと指標を整理し、課題と今後の方向性を論じる。

ABSTRACT

Clustering is a fundamental machine learning task which has been widely studied in the literature. Classic clustering methods follow the assumption that data are represented as features in a vectorized form through various representation learning techniques. As the data become increasingly complicated and complex, the shallow (traditional) clustering methods can no longer handle the high-dimensional data type. With the huge success of deep learning, especially the deep unsupervised learning, many representation learning techniques with deep architectures have been proposed in the past decade. Recently, the concept of Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. Motivated by the tremendous success of deep learning in clustering, one of the most fundamental machine learning tasks, and the large number of recent advances in this direction, in this paper we conduct a comprehensive survey on deep clustering by proposing a new taxonomy of different state-of-the-art approaches. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering. Moreover, this survey also provides the popular benchmark datasets, evaluation metrics and open-source implementations to clearly illustrate various experimental settings. Last but not least, we discuss the practical applications of deep clustering and suggest challenging topics deserving further investigations as future directions.

研究の動機と目的

深層クラスタリングの要となる要素、すなわち表現学習とクラスタリングモジュールを要約する。
モジュール間の相互作用パターンに基づく深層クラスタリング手法の体系的な分類法を提案する。
深層クラスタリングのベンチマークデータセット、指標、およびオープンソース実装を整理する。
深層クラスタリングの実用的な応用、制限事項、および今後の研究方向について論じる。

提案手法

表現学習とクラスタリングの絡み方を4つの相互作用ブランチに分類する。
表現学習のパラダイム（オートエンコーダ、生成モデル、相互情報、コントラスト、クラスタリングに適した、サブスペース）とデータ型のバリアントをレビューする。
クラスタリングモジュール（関係一致、疑似ラベリング、自己訓練）を説明し、それらがどのように表現学習を導くかを解説する。
実験設定を標準化するためにデータセット、評価指標、入手可能な実装を要約する。
分野の今後の方向性と未解決の課題に関する洞察を提供する。

実験結果

リサーチクエスチョン

RQ1深層表現学習とクラスタリングをどのように統合して、逐次学習を超えるクラスタリング性能を実現できるか？
RQ2深層クラスタリング手法における表現学習とクラスタリングの相互作用を最もよく捉える分類法は何か？
RQ3深層クラスタリング手法を評価するために一般的に用いられるデータセット、指標、実装は何か？
RQ4深層クラスタリングの今後の研究における主な課題と有望な方向性は何か？

主な発見

表現学習とクラスタリングの相互作用によって深層クラスタリング手法を4つのブランチに分類する新しい分類法。
クラスタリングに適した表現学習パラダイム（例：クラスタリングフレンドリー、サブスペース、相互情報、コントラスト）がクラスタリングへの影響の観点から強調される。
関係一致、疑似ラベリング、自己訓練などのクラスタリングモジュールは、表現とクラスタリングタスクを結びつける能力について分析される。
この調査は広く用いられるベンチマークデータセットと評価指標を記録し、再現性を支援するオープンソース実装を提供する。
応用、制限、将来の方向性を論じ、非構造データや高次元データにおける深層クラスタリングの未解決課題を特定する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。