QUICK REVIEW

[論文レビュー] Federated Learning for Healthcare Informatics

Jie Xu, Benjamin S. Glicksberg|arXiv (Cornell University)|Nov 13, 2019

Privacy-Preserving Technologies in Data参考文献 164被引用数 65

ひとこと要約

この論文は医療情報学におけるフェデレーテッドラーニングを概説し、統計的・システム・プライバシーの課題とそれに対処する方法、および医療応用を詳述する。

ABSTRACT

With the rapid development of computer software and hardware technologies, more and more healthcare data are becoming readily available from clinical institutions, patients, insurance companies and pharmaceutical industries, among others. This access provides an unprecedented opportunity for data science technologies to derive data-driven insights and improve the quality of care delivery. Healthcare data, however, are usually fragmented and private making it difficult to generate robust results across populations. For example, different hospitals own the electronic health records (EHR) of different patient populations and these records are difficult to share across hospitals because of their sensitive nature. This creates a big barrier for developing effective analytical approaches that are generalizable, which need diverse, "big data". Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, provides great promise to connect the fragmented healthcare data sources with privacy-preservation. The goal of this survey is to provide a review for federated learning technologies, particularly within the biomedical space. In particular, we summarize the general solutions to the statistical challenges, system challenges and privacy issues in federated learning, and point out the implications and potentials in healthcare.

研究の動機と目的

生データを共有せず、断片化・私的な医療データから学習する必要性を動機づける。
フェデレーテッドラーニングの基礎と、EHRやウェアラブルなど医療データへの関連性を要約する。
統計的・システム・プライバシーの課題と提案解決策を分類・レビューする。
医療特有の応用と代表的な手法を示す。
医療領域におけるFLの未解決問題と今後の方向性を議論する。

提案手法

分散データからグローバルモデルを訓練するという目的を持つ、フェデレーテッドラーニングの公式な概要を提供する。
課題を統計・通信・プライバシー/セキュリティの側面に分類・要約する。
IIDでないデータ分布に対するコンセンサス型と多元主義アプローチを論じる。
コミュニケーション効率を向上させる手法をレビューする（モデル圧縮、クライアント選択、更新削減、ピアツーピア学習）。
セキュアマルチパーティ計算と差分プライバシーを含むプライバシー技術とそれらのトレードオフを説明する。

実験結果

リサーチクエスチョン

RQ1医療データへ適用されるフェデレーテッドラーニングにおける主な統計的・システム・プライバシーの課題は何か？
RQ2非IIDデータ、通信ボトルネック、プライバシー保護に対してどのような解決策と方法があるか？
RQ3EHR分析、表現型推定、死亡率/予測モデルなど、医療タスクへのフェデレーテッドラーニングの適用例はどうなっているか？
RQ4医療情報学におけるFLの展開における未解決の問題と今後の方向性は何か？

主な発見

問題	ML手法	# 病院	データ
Patient Similarity Learning	Hashing	3	MIMIC-III
Patient Similarity Learning	Hashing	20	MIMIC-III
Phenotyping	TF (Tensor Factorization)	1-5	MIMIC-III, UCSD Wah 2011 Caltech
Phenotyping	NLP	10	MIMIC-III

フェデレーテッドラーニングはデータをローカルに保ちながら断片化された医療データ間での訓練を可能にし、プライバシー懸念に対処する。
AFLとq-Fair Federated Learningは非 IID 分布とデバイス間の公平性に対処する提案アプローチである。
プライバシー保護技術にはセキュアマルチパーティ計算と差分プライバシーが含まれ、計算コストと精度のトレードオフがある。
医療分野の応用には患者類似学習、表現型推定、マルチモーダルデータからの表現学習、および死亡率/予測タスクが含まれ、表1は代表的な論文を要約する。
コミュニケーション効率の戦略はモデル圧縮、クライアント選択、更新削減、ピアツーピア学習に分類され、実装上の実用的課題に対処する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。