QUICK REVIEW

[論文レビュー] Multimodal Federated Learning via Contrastive Representation Ensemble

Qiying Yu, Yang Liu|arXiv (Cornell University)|Feb 17, 2023

Privacy-Preserving Technologies in Data被引用数 35

ひとこと要約

CreamFLは、公開データ上で表現を交換し、グローバル-ローカル対照アグリゲーションを用いてドリフトを緩和することで、ユニモーダルおよびマルチモーダルデータを扱う異種クライアントから大規模サーバーモデルの訓練を可能にする。

ABSTRACT

With the increasing amount of multimedia data on modern mobile systems and IoT infrastructures, harnessing these rich multimodal data without breaching user privacy becomes a critical issue. Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning. However, existing FL methods extended to multimodal data all rely on model aggregation on single modality level, which restrains the server and clients to have identical model architecture for each modality. This limits the global model in terms of both model complexity and data capacity, not to mention task diversity. In this work, we propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL), a multimodal federated learning framework that enables training larger server models from clients with heterogeneous model architectures and data modalities, while only communicating knowledge on public dataset. To achieve better multimodal representation fusion, we design a global-local cross-modal ensemble strategy to aggregate client representations. To mitigate local model drift caused by two unprecedented heterogeneous factors stemming from multimodal discrepancy (modality gap and task gap), we further propose two inter-modal and intra-modal contrasts to regularize local training, which complements information of the absent modality for uni-modal clients and regularizes local clients to head towards global consensus. Thorough evaluations and ablation studies on image-text retrieval and visual question answering tasks showcase the superiority of CreamFL over state-of-the-art FL methods and its practical value.

研究の動機と目的

プライバシーを保護しつつ、異種クライアント間でマルチモーダルデータを活用した学習を行う。
私的データ上のコンパクトなクライアントモデルを用いて、より大きなサーバーモデルの訓練を可能にする。
モダリティ間のギャップとタスクギャップに起因するモデルドリフトを、対照正則化で緩和する。
グローバル-ローカルなクロスモーダル集約戦略によるマルチモーダル表現の統合を改善する。

提案手法

サーバーとクライアント間で公開データの低次元表現を伝送する。
ドリフトを減らすためのモーダリティ間およびモーダリティ内の対照正規化を用いた局所訓練。
クライアント表現を重みづけして結合する、グローバル-ローカルのクロスモーダル対照正規化。
集約表現からのサーバー側知識蒸留。
表現レベルのアンサンブルを用いて、異種モダリティとアーキテクチャをサポートする。

実験結果

リサーチクエスチョン

RQ1プライバシーを保ちながら、異種クライアントのアーキテクチャとデータモダリティを用いたマルチモーダルフェデレーテッド学習をどのように実現できるか？
RQ2公開データ上の表現レベルの知識蒸留で、エッジモデルより大きなサーバーモデルを訓練できるか？
RQ3グローバル-ローカルのクロスモーダル集約がマルチモーダル表現学習とクライアントドリフトを改善するか？
RQ4モダリティギャップやタスクギャップを最も効果的に緩和する正則化戦略（モーダル間・モーダル内対照）とは何か？
RQ5CreamFLはマルチモーダルタスクにおいて、最新のFL手法と比較して性能と通信効率がどうなるか？

主な発見

Model	i2t_R@1	i2t_R@5	i2t_R@10	t2i_R@1	t2i_R@5	t2i_R@10	R@1_sum
FedAvg	45.23	76.74	85.59	34.69	71.68	85.40	114.03
FedIoT	43.31	75.62	86.26	33.94	70.09	84.56	111.19
FedMD	48.40	80.24	89.64	38.23	74.44	86.68	128.33
FedET	48.76	80.39	89.73	38.39	74.68	86.76	129.11
FedGEMS	48.70	80.48	89.62	38.71	74.75	87.01	129.70
reamFL+Avg	48.85	80.55	89.93	38.13	74.89	86.79	130.02
reamFL+IoT	49.13	80.61	89.69	38.45	74.83	86.74	130.41
CreamFL (ours)	49.66	80.66	90.13	38.94	75.02	87.14	132.88

CreamFLは、画像-テキスト検索において、1Kおよび5Kのテストセットで基準手法（FedAvg、FedIoT、FedMD、FedET、FedGEMS）を上回り、R@1の合計がより高く、全体のリコールが優れている。
CreamFLは1Kのテスト画像で132.88のR@1_sumを達成し、すべてのベースラインより高い。
5Kのテスト画像では、t2iのR@1@10で58.82、i2tのR@1で25.34と、上位結果の中に入る。
CreamFLのVQA精度は62.12％で、最良ベースラインを1.89ポイント上回る。
アブレーション実験により、グローバル-ローカルのクロスモーダル集約（GCA）と局所対照正規化（LCR）が性能を大幅に改善することが示され、モーダル間正規化はモーダル内正規化単独より大きな改善をもたらす。
定性的分析は、CreamFLが異種クライアント間のモデルドリフトを低減し、モダリティ間の表現を整合させることを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。