Skip to main content
QUICK REVIEW

[论文解读] DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

Chengliang Liu, Jie Wen|arXiv (Cornell University)|Mar 15, 2023
Text and Document Classification Technologies被引用 8
一句话总结

DICNet 是一种深度神经网络,通过学习视图特定的高级表示、通过实例级对比学习强制跨视图共识,并在缺失数据情况下对视图进行融合,从而处理双重不完整的多视图多标签分类问题。

ABSTRACT

In recent years, multi-view multi-label learning has aroused extensive research enthusiasm. However, multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation, which means that not only multi-view features are often missing, and label completeness is also difficult to be satisfied. To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet. Different from conventional methods, our DICNet focuses on leveraging deep neural network to exploit the high-level semantic representations of samples rather than shallow-level features. First, we utilize the stacked autoencoders to build an end-to-end multi-view feature extraction framework to learn the view-specific representations of samples. Furthermore, in order to improve the consensus representation ability, we introduce an incomplete instance-level contrastive learning scheme to guide the encoders to better extract the consensus information of multiple views and use a multi-view weighted fusion module to enhance the discrimination of semantic features. Overall, our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods.

研究动机与目标

  • Motivate and address double incomplete multi-view multi-label classification where both views and labels can be missing.
  • Develop a deep architecture that learns high-level semantic features through view-specific autoencoders.
  • Incorporate incomplete instance-level contrastive learning to promote cross-view consensus.
  • Implement a weighted multi-view fusion module that robustly leverages available views.
  • Enable end-to-end supervised or semi-supervised training with missing-view and missing-label handling.

提出的方法

  • View-specific representation learning via per-view autoencoders to extract high-level features and reconstruct inputs, with a missing-view aware reconstruction loss.
  • Incomplete instance-level contrastive learning to pull together same-sample across different views and push apart different samples, using an anchor/positive/negative scheme with missing-view masking.
  • A weighted fusion module that aggregates available per-view features into a single sample representation, mitigating effects of missing views.
  • A multi-label classifier operating on the fused representation with a missing-label indicator to suppress invalid supervision.
  • Overall training objective combines multi-label classification loss, instance-level contrastive loss, and reconstruction loss: L = L_MC + β L_IC + γ L_FR.

实验结果

研究问题

  • RQ1How can double incompleteness (missing views and missing labels) be effectively addressed in MVMLC?
  • RQ2Can an end-to-end DNN leveraging instance-level contrastive learning improve cross-view consensus and discriminability under incomplete data?
  • RQ3Does a weighted fusion strategy improve robustness to missing views while preserving discriminative semantic information?
  • RQ4What is the impact of the proposed losses (classification, contrastive, reconstruction) on performance in DIMVMLC tasks?

主要发现

DatasetMetriclrMMCMVL-IVMvEL-ILDiMSFiMvWLNAIMLours
Corel5kAP0.2400.2400.2040.1890.2830.3090.381
Corel5k1-HL0.9540.9540.9460.9430.9780.9870.988
Corel5k1-RL0.7620.7560.6380.7090.8650.8780.882
Corel5kAUC0.7630.7620.7150.6630.8680.8810.884
VOC2007AP0.4250.4330.3580.3250.4410.4880.505
VOC20071-HL0.8820.8830.8370.8360.8820.9280.929
VOC20071-RL0.6980.7020.6430.5680.7370.7830.783
VOC2007AUC0.7280.7300.6860.6200.7670.8110.809
ESP GameAP0.1880.1890.1320.1080.2420.2460.297
ESP Game1-HL0.9700.9700.9670.9640.9720.9830.983
ESP Game1-RL0.7770.7780.6830.7220.8070.8180.832
ESP GameAUC0.7830.7840.7340.6740.8130.8240.836
IAPR TC-12AP0.1970.1980.1410.1010.2350.2610.323
IAPR TC-121-HL0.9670.9670.9630.9600.9690.9810.981
IAPR TC-121-RL0.8010.7990.7250.6310.8330.8480.873
IAPR TC-12AUC0.8050.8040.7460.6650.8360.8500.874
MIR FlickrAP0.4410.4490.3750.3230.4950.5510.589
MIR Flickr1-HL0.8390.8390.7780.7750.8400.8820.888
MIR Flickr1-RL0.8020.8080.7710.6410.8060.8440.863
MIR FlickrAUC0.8060.8070.7610.7150.7940.8370.849
  • DICNet outperforms state-of-the-art methods on five datasets under double incomplete conditions across multiple metrics.
  • On Corel5k, DICNet achieves 0.381 AP and 0.988 1-HL, 0.882 1-RL, and 0.884 AUC, outperforming competitors.
  • On VOC2007, DICNet achieves 0.505 AP, 0.929 1-HL, 0.783 1-RL, and 0.809 AUC, leading over baselines.
  • On ESP Game, DICNet achieves 0.297 AP, 0.983 1-HL, 0.832 1-RL, and 0.836 AUC, surpassing comparisons.
  • On IAPR TC-12, DICNet achieves 0.323 AP, 0.981 1-HL, 0.873 1-RL, and 0.874 AUC, outperforming rivals.
  • On MIR Flickr, DICNet achieves 0.589 AP, 0.888 1-HL, 0.863 1-RL, and 0.849 AUC, showing consistent gains.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。