QUICK REVIEW

[論文レビュー] Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

Chengliang Liu, Jie Wen|arXiv (Cornell University)|Mar 13, 2023

Text and Document Classification Technologies被引用数 8

ひとこと要約

LMVCATは、欠落したマルチビューおよびマルチラベルデータを扱うTransformerベースのフレームワークを導入します。マスク付きビュー意識型エンコーダ、適応的重み付けビュー融合、ラベルガイド付きグラフ制約、カテゴリ認識Transformerを用いて、ラベル間の相関を捉えます。

ABSTRACT

As we all know, multi-view data is more expressive than single-view data and multi-label annotation enjoys richer supervision information than single-label, which makes multi-view multi-label learning widely applicable for various pattern recognition tasks. In this complex representation learning problem, three main challenges can be characterized as follows: i) How to learn consistent representations of samples across all views? ii) How to exploit and utilize category correlations of multi-label to guide inference? iii) How to avoid the negative impact resulting from the incompleteness of views or labels? To cope with these problems, we propose a general multi-view multi-label learning framework named label-guided masked view- and category-aware transformers in this paper. First, we design two transformer-style based modules for cross-view features aggregation and multi-label classification, respectively. The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance. Second, considering the imbalance of expressive power among views, an adaptively weighted view fusion module is proposed to obtain view-consistent embedding features. Third, we impose a label manifold constraint in sample-level representation learning to maximize the utilization of supervised information. Last but not least, all the modules are designed under the premise of incomplete views and labels, which makes our method adaptable to arbitrary multi-view and multi-label data. Extensive experiments on five datasets confirm that our method has clear advantages over other state-of-the-art methods.

研究の動機と目的

欠落ビューとラベル下で、マルチビュー情報から一貫した高レベル表現を学習する動機付け。
ラベルの相関をラベル多様体とカテゴリ認識埋め込み空間を通じて活用する。
マスク付きビュー相互作用、適応的フュージョン、ラベルガイド制約を備えたTransformerベースのアーキテクチャを開発する。
欠落したビューとラベルに対して堅牢性を確保しつつ、ビュー間・ラベル間情報を活用する。）
method）: [
Introduce VFormer: a masked view-aware transformer encoder that aggregates cross-view information with a missing-view mask.
Implement an adaptively weighted fusion to combine per-view embeddings into a consistent sample representation.
Define a label-guided graph constraint (L_gc) using a label similarity matrix to guide representation learning.
Introduce CFormer: a category-aware transformer that models inter-category correlations using class tokens and fusion features.
Train with a joint loss L = L_mc + alpha L_gc + beta L_ac, where L_mc is multi-label classification loss and L_ac is ancillary supervision from category tokens.

提案手法

VFormerを導入する：欠落ビューマスクを用いてクロスビュー情報を集約するマスク付きビュー意識型Transformerエンコーダ。
各ビュー埋め込みを組み合わせて一貫したサンプル表現へ適応的に重み付けするフュージョンを実装。
ラベル類似度行列を用いて表現学習をガイドするラベルガイド付きグラフ制約（L_gc）を定義。
CFormerを導入する：クラストークンとフュージョン特徴を用いてカテゴリ間相関をモデル化するカテゴリ認識Transformer。
L_mcがマルチラベル分類損失、L_acがカテゴリトークンからの補助監督を含む結合損失L = L_mc + alpha L_gc + beta L_acで訓練。

実験結果

リサーチクエスチョン

RQ1欠落したマルチビューデータから一貫性のある高品質な表現を学習するには？
RQ2ラベル間相関を活用してサンプルのエンコードをガイドし、欠落ラベル下でマルチラベル予測を改善できるか？
RQ3マスク付きビュー相互作用とカテゴリ認識モデリングを備えたTransformerベースのアーキテクチャは、既存の欠落MvMlC法を上回るか？

主な発見

Dataset	Metric	lrMMC	MVL-IV	MvEL	iMSF	C2AE	GLOCAL	iMvWL	NAIML	ours
Corel5k	AP	.762(.002)	.756(.001)	.638(.003)	.709(.005)	.804(.010)	0.840(0.003)	.865(.003)	.878(.002)	.880(.002)
Pascal07	AP	.698(.003)	.433(.002)	.358(.003)	.325(.000)	.485(.008)	0.496(0.004)	.441(.017)	.488(.003)	.519(.005)
Espgame	AP	.188(.000)	.189(.000)	.132(.000)	.108(.000)	.202(.006)	0.221(0.002)	.242(.003)	.246(.002)	.294(.004)
Iaprtc12	AP	.197(.000)	.198(.000)	.141(.000)	.101(.000)	.224(.007)	0.256(0.002)	.235(.004)	.261(.001)	.317(.003)
Mirflickr	AP	.441(.001)	.449(.001)	.375(.000)	.323(.000)	.505(.008)	0.537(0.002)	.495(.012)	.550(.002)	.594(.005)

LMVCATは5つの欠落データセット全体で3つの指標（AP、1-RL、AUC）に明確な利点を示す。
Corel5kおよびEspgameでは、LMVCATのAPが第二位NAIMLをそれぞれ約7%および5%上回る。
完全ビューのデータセットでもLMVCATは依然として高い性能を示し、例えばCorel5kのAPは次点の手法（GLOCAL）より約14ポイント高い。
同僚法と比較して、LMVCATはデータセット間で一貫して1-RLとAUCを改善し、欠落したビューとラベルに対する頑健性を示す。
アブレーション研究は、ラベルガイド付きグラフ制約および他の要素が性能に有益であることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。