[论文解读] DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification
DICNet 是一种深度神经网络,通过学习视图特定的高级表示、通过实例级对比学习强制跨视图共识,并在缺失数据情况下对视图进行融合,从而处理双重不完整的多视图多标签分类问题。
In recent years, multi-view multi-label learning has aroused extensive research enthusiasm. However, multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation, which means that not only multi-view features are often missing, and label completeness is also difficult to be satisfied. To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet. Different from conventional methods, our DICNet focuses on leveraging deep neural network to exploit the high-level semantic representations of samples rather than shallow-level features. First, we utilize the stacked autoencoders to build an end-to-end multi-view feature extraction framework to learn the view-specific representations of samples. Furthermore, in order to improve the consensus representation ability, we introduce an incomplete instance-level contrastive learning scheme to guide the encoders to better extract the consensus information of multiple views and use a multi-view weighted fusion module to enhance the discrimination of semantic features. Overall, our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods.
研究动机与目标
- Motivate and address double incomplete multi-view multi-label classification where both views and labels can be missing.
- Develop a deep architecture that learns high-level semantic features through view-specific autoencoders.
- Incorporate incomplete instance-level contrastive learning to promote cross-view consensus.
- Implement a weighted multi-view fusion module that robustly leverages available views.
- Enable end-to-end supervised or semi-supervised training with missing-view and missing-label handling.
提出的方法
- View-specific representation learning via per-view autoencoders to extract high-level features and reconstruct inputs, with a missing-view aware reconstruction loss.
- Incomplete instance-level contrastive learning to pull together same-sample across different views and push apart different samples, using an anchor/positive/negative scheme with missing-view masking.
- A weighted fusion module that aggregates available per-view features into a single sample representation, mitigating effects of missing views.
- A multi-label classifier operating on the fused representation with a missing-label indicator to suppress invalid supervision.
- Overall training objective combines multi-label classification loss, instance-level contrastive loss, and reconstruction loss: L = L_MC + β L_IC + γ L_FR.
实验结果
研究问题
- RQ1How can double incompleteness (missing views and missing labels) be effectively addressed in MVMLC?
- RQ2Can an end-to-end DNN leveraging instance-level contrastive learning improve cross-view consensus and discriminability under incomplete data?
- RQ3Does a weighted fusion strategy improve robustness to missing views while preserving discriminative semantic information?
- RQ4What is the impact of the proposed losses (classification, contrastive, reconstruction) on performance in DIMVMLC tasks?
主要发现
| Dataset | Metric | lrMMC | MVL-IV | MvEL-ILD | iMSF | iMvWL | NAIML | ours |
|---|---|---|---|---|---|---|---|---|
| Corel5k | AP | 0.240 | 0.240 | 0.204 | 0.189 | 0.283 | 0.309 | 0.381 |
| Corel5k | 1-HL | 0.954 | 0.954 | 0.946 | 0.943 | 0.978 | 0.987 | 0.988 |
| Corel5k | 1-RL | 0.762 | 0.756 | 0.638 | 0.709 | 0.865 | 0.878 | 0.882 |
| Corel5k | AUC | 0.763 | 0.762 | 0.715 | 0.663 | 0.868 | 0.881 | 0.884 |
| VOC2007 | AP | 0.425 | 0.433 | 0.358 | 0.325 | 0.441 | 0.488 | 0.505 |
| VOC2007 | 1-HL | 0.882 | 0.883 | 0.837 | 0.836 | 0.882 | 0.928 | 0.929 |
| VOC2007 | 1-RL | 0.698 | 0.702 | 0.643 | 0.568 | 0.737 | 0.783 | 0.783 |
| VOC2007 | AUC | 0.728 | 0.730 | 0.686 | 0.620 | 0.767 | 0.811 | 0.809 |
| ESP Game | AP | 0.188 | 0.189 | 0.132 | 0.108 | 0.242 | 0.246 | 0.297 |
| ESP Game | 1-HL | 0.970 | 0.970 | 0.967 | 0.964 | 0.972 | 0.983 | 0.983 |
| ESP Game | 1-RL | 0.777 | 0.778 | 0.683 | 0.722 | 0.807 | 0.818 | 0.832 |
| ESP Game | AUC | 0.783 | 0.784 | 0.734 | 0.674 | 0.813 | 0.824 | 0.836 |
| IAPR TC-12 | AP | 0.197 | 0.198 | 0.141 | 0.101 | 0.235 | 0.261 | 0.323 |
| IAPR TC-12 | 1-HL | 0.967 | 0.967 | 0.963 | 0.960 | 0.969 | 0.981 | 0.981 |
| IAPR TC-12 | 1-RL | 0.801 | 0.799 | 0.725 | 0.631 | 0.833 | 0.848 | 0.873 |
| IAPR TC-12 | AUC | 0.805 | 0.804 | 0.746 | 0.665 | 0.836 | 0.850 | 0.874 |
| MIR Flickr | AP | 0.441 | 0.449 | 0.375 | 0.323 | 0.495 | 0.551 | 0.589 |
| MIR Flickr | 1-HL | 0.839 | 0.839 | 0.778 | 0.775 | 0.840 | 0.882 | 0.888 |
| MIR Flickr | 1-RL | 0.802 | 0.808 | 0.771 | 0.641 | 0.806 | 0.844 | 0.863 |
| MIR Flickr | AUC | 0.806 | 0.807 | 0.761 | 0.715 | 0.794 | 0.837 | 0.849 |
- DICNet outperforms state-of-the-art methods on five datasets under double incomplete conditions across multiple metrics.
- On Corel5k, DICNet achieves 0.381 AP and 0.988 1-HL, 0.882 1-RL, and 0.884 AUC, outperforming competitors.
- On VOC2007, DICNet achieves 0.505 AP, 0.929 1-HL, 0.783 1-RL, and 0.809 AUC, leading over baselines.
- On ESP Game, DICNet achieves 0.297 AP, 0.983 1-HL, 0.832 1-RL, and 0.836 AUC, surpassing comparisons.
- On IAPR TC-12, DICNet achieves 0.323 AP, 0.981 1-HL, 0.873 1-RL, and 0.874 AUC, outperforming rivals.
- On MIR Flickr, DICNet achieves 0.589 AP, 0.888 1-HL, 0.863 1-RL, and 0.849 AUC, showing consistent gains.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。