QUICK REVIEW

[논문 리뷰] Calibrating Multimodal Learning

Huan Zhang, Changqing Zhang|arXiv (Cornell University)|2023. 06. 02.

Machine Learning and Data Classification인용 수 10

한 줄 요약

논문은 기존의 다중모달 분류기가 일부 모달리티만 이용될 때 과도하게 확신을 보일 수 있음을 보여주고, Calibrating Multimodal Learning (CML) 정규화를 도입해 확신과 모달리티 수의 불일치를 맞추어 보정, 정확도 및 강건성을 향상시킵니다.

ABSTRACT

Multimodal machine learning has achieved remarkable progress in a wide range of scenarios. However, the reliability of multimodal learning remains largely unexplored. In this paper, through extensive empirical studies, we identify current multimodal classification methods suffer from unreliable predictive confidence that tend to rely on partial modalities when estimating confidence. Specifically, we find that the confidence estimated by current models could even increase when some modalities are corrupted. To address the issue, we introduce an intuitive principle for multimodal learning, i.e., the confidence should not increase when one modality is removed. Accordingly, we propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods. This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.

연구 동기 및 목표

현재의 다중모달 분류기가 부분적으로 관찰된 모달리티에서 종종 신뢰할 수 없는 확신 추정치를 생성한다는 것을 입증한다.
모달리티가 제거될 때 예측 확신이 증가해서는 안 된다는 순위 기반 원칙을 제안한다.
샘플 간 확신-모달 일관성을 강제하기 위한 CML 정규화를 도입한다.
다양한 다중모달 데이터셋에 걸쳐 CML의 적용이 확신 보정, 분류 정확도 및 강건성을 향상시킴을 보여준다.

제안 방법

회귀 없는 순위 기반 원칙을 정의한다: Conf(x(T)) ≤ Conf(x(S)) for T ⊂ S ⊆ M.
Conf(x(T)) > Conf(x(S))인 경우를 힌지 손실로 벌점하는 새 정규화항 L_CML을 도입한다: max(0, Conf(x(T)) − Conf(x(S))).
계 computational 비용을 줄이기 위해 모달리티 쌍을 샘플링하여 정규화를 근사한다.
기존 분류 손실 L_CL과 함께 L = L_CL + λ L_CML로 L_CML을 통합하고 모델 매개변수를 업데이트한다.
CML의 적용 가능성을 보임: 보간 독립 방법(CPM-Nets), 보간 의존 방법(MIWAE), 그리고 현대 다중모달 분류기(MMTM)에 대해.
VRR를 신뢰도 지표로 사용하고 YaleB, Handwritten, CUB, Animal, TUANDROMD, NYUD2, SUNRGBD를 포함한 데이터셋에서 평가한다.

Figure 1: Motivation of calibrating multimodal learning. The confidence of an ideal multimodal classifier should decrease or at least not increase when one modality is removed (even when the removed modality is noised, or it indicates the model takes noise as semantics and the model is not trustwort

실험 결과

연구 질문

RQ1현재의 다중모달 분류기는 하나의 모달리티가 제거될 때 신뢰도 추정이 신뢰할 수 없게 나타나는가?
RQ2확신 보정 정규화가 샘플 간 확신과 모달리티 수 간의 순위 관계를 개선할 수 있는가?
RQ3제안된 CML 정규화가 다양한 다중모달 데이터셋에서 확신 보정, 정확도 및 강건성을 향상시키는가?
RQ4다양한 다중모달 아키텍처에 대해 CML의 배치가 쉬우며 하이퍼파라미터에 지나치게 민감하지 않은가?

주요 결과

현재의 다중모달 방법은 높은 VRR를 보여주며, 모달리티가 제거되면 많은 샘플의 확신이 증가한다는 것을 나타낸다.
CML 정규화는 VRR을 감소시키고 평가된 모델 전반에서 더 신뢰할 수 있는 확신 추정치를 산출한다.
CML이 적용된 모델은 특히 모달리티 손상이나 잡음 하에서 정확도와 강건성이 향상된다.
CML은 하이퍼파라미터 선택에 대한 내성이 있음을 보이고, 기존 다중모달 시스템에 구조 변경 없이 통합될 수 있다.
CML은 Type III 모델에서 특히 우수하게 작동하며 다양한 데이터셋에서도 이점을 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.