QUICK REVIEW

[논문 리뷰] SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

Hao Dong, Ismail Nejjar|arXiv (Cornell University)|2023. 10. 30.

Multimodal Machine Learning Applications인용 수 7

한 줄 요약

SimMMDG는 감독된 대조 학습과 교차 모달 번역 모듈을 사용하여 모달리티별 특성과 모달리티 공유 특성으로 나누는 특징 분할을 도입함으로써 다중 모달 도메인 일반화와 누락 모달리티에 대한 강건성을 향상시킨다. EPIC-Kitchens와 HAC 데이터셋에서 강력한 성능을 달성한다.

ABSTRACT

In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.

연구 동기 및 목표

unseen 다중 모달 분포에 걸친 강건한 일반화를 촉진한다.
모달리티 간의 단순한 특징 정렬을 피함으로써 모달리티별 정보 손실을 방지한다.
라벨 일치를 유지하는 정보의 교차 모달 공유를 촉진하되 모달 다양성은 보존한다.
테스트 시 누락 모달리티를 처리하기 위한 교차 모달 번역 메커니즘을 제공한다.
새로운 HAC 데이터셋을 도입하여 다중 모달 DG를 벤치마킹한다.

제안 방법

각 모달리티 임베딩을 모달리티-특정 구성요소와 모달리티-공유 구성요소로 분리한다.
모달리티-공유 특징에 감독된 대조 학습을 적용하여 같은 라벨의 교차 모달 인스턴스를 클러스터링한다.
각 모달리티 내에서 모달리티-특정 및 모달리티-공유 특징 간의 거리를 최대화하여 분리되도록 거리 기반 손실을 적용한다.
모달리티 간 임베딩을 번역하고 특징을 정규화하는 교차 모달 번역 모듈(MLP)을 도입한다(L_trans).
손실을 최종 목표로 결합한다: L = L_cls + alpha_con L_con + alpha_dis L_dis + alpha_trans L_trans.
누락 모달리티 테스트 중에는 누락 임베딩을 번역(E_i_t)으로 예측하고 이를 robust한 예측에 대입한다.

Figure 1: (a). Different modalities possess shared information, while simultaneously containing unique information exclusive to each modality. Inspired by this, we propose to split the feature of each modality into modality-specific and modality-shared parts in our framework. (b) Our new multi-modal

실험 결과

연구 질문

RQ1다중 모달 DG를 모달리티를 단일 공유 임베딩 공간으로 압축하지 않고 어떻게 개선할 수 있을까?
RQ2DG를 위해 공유된 교차 모달 정보를 활용하면서 모달리티별 정보를 보존할 수 있을까?
RQ3교차 모달 번역 메커니즘은 누락 모달리티에 대한 강건성을 향상시키는가?
RQ4제안 방식은 표준 다중 모달 DG 벤치마크와 새로운 HAC 데이터셋에서 어떻게 일반화되는가?

주요 결과

SimMMDG는 세 가지 모달리티를 모두 사용할 때 EPIC-Kitchens에서 Baselines보다 일관되게 우수하며 개선 폭이 최대 9.58%에 달한다.
SlowFast 및 ResNet-18 백본과 함께 SimMMDG는 Baselines 대비 평균 최대 5.73%의 개선을 보인다.
HAC 데이터셋에서 SimMMDG는 Baselines보다 최대 7.73%의 개선을 보인다.
다중 모달 단일 소스 DG에서 SimMMDG는 경쟁 방법에 비해 평균 최대 5.71%의 향상을 달성한다.
누락 모달리티의 경우 제로 채움(zero-filling)에 비해 교차 모달 번역 임베딩으로 대체하면 최대 10.47%의 정확도 향상을 얻을 수 있으며, 종종 단일 모달 모델을 능가한다.

Figure 2: Overview of SimMMDG . We split the features of each modality into modality-specific and modality-shared parts. For the modality-shared part, we use supervised contrastive learning to map the features with the same label to be as close as possible. For modality-specific features, we use a d

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.