QUICK REVIEW

[论文解读] SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

Hao Dong, Ismail Nejjar|arXiv (Cornell University)|Oct 30, 2023

Multimodal Machine Learning Applications被引用 7

一句话总结

SimMMDG 引入模态特异和模态共享特征分割，结合有监督对比学习和跨模态翻译模块，以提升多模态领域泛化能力与缺失模态鲁棒性。在 EPIC-Kitchens 和 HAC 数据集上取得强劲结果。

ABSTRACT

In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.

研究动机与目标

旨在在未见多模态分布上实现稳健泛化。
通过避免对模态进行简单特征对齐而防止模态特异信息丢失。
促进跨模态共享标签一致的信息，同时保留模态多样性。
提供跨模态翻译机制以在测试阶段处理缺失模态。
引入新颖的 HAC 数据集以基准评估多模态 DG。

提出的方法

将每个模态嵌入分割为模态特异和模态共享两部分。
对模态共享特征应用有监督对比学习，以聚类同标签的跨模态实例。
在每个模态内对模态特异和模态共享特征之间施加距离损失以实现最大区分。
引入跨模态翻译模块（MLP）以在模态间翻译嵌入并对特征进行正则化（L_trans）。
将损失整合为最终目标：L = L_cls + alpha_con L_con + alpha_dis L_dis + alpha_trans L_trans。
在缺失模态测试阶段，通过翻译预测缺失嵌入（E_i_t）并将其替换，以获得鲁棒预测。

Figure 1: (a). Different modalities possess shared information, while simultaneously containing unique information exclusive to each modality. Inspired by this, we propose to split the feature of each modality into modality-specific and modality-shared parts in our framework. (b) Our new multi-modal

实验结果

研究问题

RQ1如何在不将模态崩塌为单一共享嵌入空间的前提下提升多模态 DG？
RQ2在利用跨模态共享信息进行 DG 的同时，能否保留模态特异信息？
RQ3跨模态翻译机制是否提升对缺失模态的鲁棒性？
RQ4该方法在标准多模态 DG 基准和新的 HAC 数据集上的泛化能力如何？

主要发现

在 EPIC-Kitchens 上，SimMMDG 对基线方法的提升持续存在，当使用全部三种模态时提升幅度高达 9.58%。
在 SlowFast 和 ResNet-18 主干下，SimMMDG 相对于基线的平均提升最高可达 5.73%。
在 HAC 数据集上，SimMMDG 相比基线提升高达 7.73%。
在多模态单源 DG 情况下，SimMMDG 相对于竞争方法实现了最高 5.71% 的平均提升。
对于缺失模态，用跨模态翻译嵌入替代零填充比率可带来最高 10.47% 的准确率提升，且通常优于单模态模型。

Figure 2: Overview of SimMMDG . We split the features of each modality into modality-specific and modality-shared parts. For the modality-shared part, we use supervised contrastive learning to map the features with the same label to be as close as possible. For modality-specific features, we use a d

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。