QUICK REVIEW

[论文解读] Calibrating Multimodal Learning

Huan Zhang, Changqing Zhang|arXiv (Cornell University)|Jun 2, 2023

Machine Learning and Data Classification被引用 10

一句话总结

本文表明，当仅有部分模态可用时，现有的多模态分类器可能产生过于自信的预测，并引入校准多模态学习（CML）正则化，以使置信度与模态数量一致，从而提升校准、准确性和鲁棒性。

ABSTRACT

Multimodal machine learning has achieved remarkable progress in a wide range of scenarios. However, the reliability of multimodal learning remains largely unexplored. In this paper, through extensive empirical studies, we identify current multimodal classification methods suffer from unreliable predictive confidence that tend to rely on partial modalities when estimating confidence. Specifically, we find that the confidence estimated by current models could even increase when some modalities are corrupted. To address the issue, we introduce an intuitive principle for multimodal learning, i.e., the confidence should not increase when one modality is removed. Accordingly, we propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods. This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.

研究动机与目标

证明当前的多模态分类器在模态部分观测时，往往会产生不可靠的置信度估计。
提出一种基于排序的原则：移除某一模态时，预测置信度不应增大。
引入 CML 正则化以在样本间强化置信度与模态的一致性。
表明应用 CML 能在多样的多模态数据集上提升置信度校准、分类准确性和鲁棒性。

提出的方法

定义一种无回归、基于排序的原则：Conf(x(T)) ≤ Conf(x(S)) 对所有 T ⊂ S ⊆ M。
引入新的正则化项 L_CML，通过hinge：max(0, Conf(x(T)) − Conf(x(S))) 来惩罚 Conf(x(T)) > Conf(x(S)) 的情况。
通过对模态对进行采样来近似正则化，以降低计算成本。
将 L_CML 与现有分类损失结合，记为 L = L_CL + λ L_CML，并更新模型参数。
演示 CML 对与插补无关的方法（CPM-Nets）、与插补相关的方法（MIWAE）以及现代多模态分类器（MMTM）的适用性。
使用 VRR 作为可靠性指标，在包括 YaleB、Handwritten、CUB、Animal、TUANDROMD、NYUD2 与 SUNRGBD 的数据集上进行评估。

Figure 1: Motivation of calibrating multimodal learning. The confidence of an ideal multimodal classifier should decrease or at least not increase when one modality is removed (even when the removed modality is noised, or it indicates the model takes noise as semantics and the model is not trustwort

实验结果

研究问题

RQ1当移除一个模态时，当前的多模态分类器是否表现出不可靠的置信度估计？
RQ2置信度校准正则化是否能提升置信度与模态数量在样本间的排序关系？
RQ3提出的 CML 正则化是否在多样的多模态数据集上提升置信度校准、准确性和鲁棒性？
RQ4CML 是否易于与不同多模态架构部署，并且对超参数不太敏感？

主要发现

目前的多模态方法显示高 VRR，表明当移除一个模态时，许多样本的置信度在上升。
CML 正则化降低 VRR，在所评估的模型中产生更可信的置信度估计。
采用 CML 的模型在准确性和鲁棒性方面取得提升，特别是在模态损坏或噪声存在时。
CML 对超参数选择具有韧性，可以在不改变架构的前提下与现有多模态系统集成。
CML 在 Type III 模型上表现尤为出色，并在多样的数据集上保持有益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。