QUICK REVIEW

[論文レビュー] Calibrating Multimodal Learning

Huan Zhang, Changqing Zhang|arXiv (Cornell University)|Jun 2, 2023

Machine Learning and Data Classification被引用数 10

ひとこと要約

この論文は、既存のマルチモーダル分類器が部分的なモダリティしか利用できない場合に過信的な予測を生成し得ることを示し、信頼度をモダリティ数と整合させる Calibrating Multimodal Learning (CML) 正則化を提案して、キャリブレーション、精度、頑健性を改善する。

ABSTRACT

Multimodal machine learning has achieved remarkable progress in a wide range of scenarios. However, the reliability of multimodal learning remains largely unexplored. In this paper, through extensive empirical studies, we identify current multimodal classification methods suffer from unreliable predictive confidence that tend to rely on partial modalities when estimating confidence. Specifically, we find that the confidence estimated by current models could even increase when some modalities are corrupted. To address the issue, we introduce an intuitive principle for multimodal learning, i.e., the confidence should not increase when one modality is removed. Accordingly, we propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods. This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.

研究の動機と目的

現在のマルチモーダル分類器は、モダリティが部分的に観測されると、信頼性の低い信頼度推定を頻繁に生み出すことを示す。
予測信頼度はモダリティが削除されたときに増加すべきではないという、ランキングベースの原理を提案する。
サンプル間で信頼度とモダリティの整合性を強制する CML 正則化を導入する。
CML の適用が、信頼度キャリブレーション、分類精度、そして多様なマルチモーダルデータセットにおける頑健性を改善することを示す。）

提案手法

回帰を用いない、ランキングベースの原理を定義する: Conf(x(T)) ≤ Conf(x(S)) for T ⊂ S ⊆ M.
Conf(x(T)) > Conf(x(S)) のケースをペナルティする新しい正則化項 L_CML を導入する。ヒンジ: max(0, Conf(x(T)) − Conf(x(S))).
計算コストを削減するためにモダリティペアをサンプリングして正則化を近似する。
既存の分類損失に L_CML を組み込み、L = L_CL + λ L_CML としてモデルパラメータを更新する。
CML の適用可能性を、補完不能法（CPM-Nets）、補完依存法（MIWAE）、および現代的なマルチモーダル分類器（MMTM）に対して実証する。
信頼性指標としてVRRを用い、 YaleB、Handwritten、CUB、Animal、TUANDROMD、NYUD2、SUNRGBD を含むデータセットで評価する。

Figure 1: Motivation of calibrating multimodal learning. The confidence of an ideal multimodal classifier should decrease or at least not increase when one modality is removed (even when the removed modality is noised, or it indicates the model takes noise as semantics and the model is not trustwort

実験結果

リサーチクエスチョン

RQ1現在のマルチモーダル分類器は、モダリティの一つを削除したとき信頼性の低い推定を示すか？
RQ2信頼度キャリブレーションの正則化は、サンプル間の信頼度とモダリティ数のランキング関係を改善できるか？
RQ3提案された CML 正則化は、さまざまなマルチモーダルデータセットで信頼度キャリブレーション、精度、頑健性を向上させるか？
RQ4CML はさまざまなマルチモーダルアーキテクチャで導入しやすく、ハイパーパラメータに過度に敏感でないか？

主な発見

現在のマルチモーダル手法は高い VRR を示しており、モダリティを削除したときに信頼度が上昇するサンプルが多いことを示している。
CML 正則化は VRR を低減し、評価されたモデル全体でより信頼性の高い信頼推定を生み出す。
CML を用いたモデルは、特にモダリティの破損やノイズ下で精度と頑健性が向上する。
CML はハイパーパラメータの選択に対して頑健であり、アーキテクチャ変更なしで既存のマルチモーダルシステムに組み込めることを示す。
CML は Type III モデルで notably well に、さまざまなデータセットでも有益である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。