QUICK REVIEW

[论文解读] DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Chengliang Liu, Jie Wen|arXiv (Cornell University)|Mar 15, 2023

Text and Document Classification Technologies被引用 8

一句话总结

DICNet 是一种深度神经网络，通过学习视图特定的高级表示、通过实例级对比学习强制跨视图共识，并在缺失数据情况下对视图进行融合，从而处理双重不完整的多视图多标签分类问题。

ABSTRACT

In recent years, multi-view multi-label learning has aroused extensive research enthusiasm. However, multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation, which means that not only multi-view features are often missing, and label completeness is also difficult to be satisfied. To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet. Different from conventional methods, our DICNet focuses on leveraging deep neural network to exploit the high-level semantic representations of samples rather than shallow-level features. First, we utilize the stacked autoencoders to build an end-to-end multi-view feature extraction framework to learn the view-specific representations of samples. Furthermore, in order to improve the consensus representation ability, we introduce an incomplete instance-level contrastive learning scheme to guide the encoders to better extract the consensus information of multiple views and use a multi-view weighted fusion module to enhance the discrimination of semantic features. Overall, our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods.

研究动机与目标

Motivate and address double incomplete multi-view multi-label classification where both views and labels can be missing.
Develop a deep architecture that learns high-level semantic features through view-specific autoencoders.
Incorporate incomplete instance-level contrastive learning to promote cross-view consensus.
Implement a weighted multi-view fusion module that robustly leverages available views.
Enable end-to-end supervised or semi-supervised training with missing-view and missing-label handling.

提出的方法

View-specific representation learning via per-view autoencoders to extract high-level features and reconstruct inputs, with a missing-view aware reconstruction loss.
Incomplete instance-level contrastive learning to pull together same-sample across different views and push apart different samples, using an anchor/positive/negative scheme with missing-view masking.
A weighted fusion module that aggregates available per-view features into a single sample representation, mitigating effects of missing views.
A multi-label classifier operating on the fused representation with a missing-label indicator to suppress invalid supervision.
Overall training objective combines multi-label classification loss, instance-level contrastive loss, and reconstruction loss: L = L_MC + β L_IC + γ L_FR.

实验结果

研究问题

RQ1How can double incompleteness (missing views and missing labels) be effectively addressed in MVMLC?
RQ2Can an end-to-end DNN leveraging instance-level contrastive learning improve cross-view consensus and discriminability under incomplete data?
RQ3Does a weighted fusion strategy improve robustness to missing views while preserving discriminative semantic information?
RQ4What is the impact of the proposed losses (classification, contrastive, reconstruction) on performance in DIMVMLC tasks?

主要发现

Dataset	Metric	lrMMC	MVL-IV	MvEL-ILD	iMSF	iMvWL	NAIML	ours
Corel5k	AP	0.240	0.240	0.204	0.189	0.283	0.309	0.381
Corel5k	1-HL	0.954	0.954	0.946	0.943	0.978	0.987	0.988
Corel5k	1-RL	0.762	0.756	0.638	0.709	0.865	0.878	0.882
Corel5k	AUC	0.763	0.762	0.715	0.663	0.868	0.881	0.884
VOC2007	AP	0.425	0.433	0.358	0.325	0.441	0.488	0.505
VOC2007	1-HL	0.882	0.883	0.837	0.836	0.882	0.928	0.929
VOC2007	1-RL	0.698	0.702	0.643	0.568	0.737	0.783	0.783
VOC2007	AUC	0.728	0.730	0.686	0.620	0.767	0.811	0.809
ESP Game	AP	0.188	0.189	0.132	0.108	0.242	0.246	0.297
ESP Game	1-HL	0.970	0.970	0.967	0.964	0.972	0.983	0.983
ESP Game	1-RL	0.777	0.778	0.683	0.722	0.807	0.818	0.832
ESP Game	AUC	0.783	0.784	0.734	0.674	0.813	0.824	0.836
IAPR TC-12	AP	0.197	0.198	0.141	0.101	0.235	0.261	0.323
IAPR TC-12	1-HL	0.967	0.967	0.963	0.960	0.969	0.981	0.981
IAPR TC-12	1-RL	0.801	0.799	0.725	0.631	0.833	0.848	0.873
IAPR TC-12	AUC	0.805	0.804	0.746	0.665	0.836	0.850	0.874
MIR Flickr	AP	0.441	0.449	0.375	0.323	0.495	0.551	0.589
MIR Flickr	1-HL	0.839	0.839	0.778	0.775	0.840	0.882	0.888
MIR Flickr	1-RL	0.802	0.808	0.771	0.641	0.806	0.844	0.863
MIR Flickr	AUC	0.806	0.807	0.761	0.715	0.794	0.837	0.849

DICNet outperforms state-of-the-art methods on five datasets under double incomplete conditions across multiple metrics.
On Corel5k, DICNet achieves 0.381 AP and 0.988 1-HL, 0.882 1-RL, and 0.884 AUC, outperforming competitors.
On VOC2007, DICNet achieves 0.505 AP, 0.929 1-HL, 0.783 1-RL, and 0.809 AUC, leading over baselines.
On ESP Game, DICNet achieves 0.297 AP, 0.983 1-HL, 0.832 1-RL, and 0.836 AUC, surpassing comparisons.
On IAPR TC-12, DICNet achieves 0.323 AP, 0.981 1-HL, 0.873 1-RL, and 0.874 AUC, outperforming rivals.
On MIR Flickr, DICNet achieves 0.589 AP, 0.888 1-HL, 0.863 1-RL, and 0.849 AUC, showing consistent gains.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。