QUICK REVIEW

[論文レビュー] FederBoost: Private Federated Learning for GBDT

Zhihua Tian, Rui Zhang|arXiv (Cornell University)|Nov 5, 2020

Privacy-Preserving Technologies in Data参考文献 29被引用数 34

ひとこと要約

FederBoost は、垂直および水平データ分割の両方に対して、勾配ブースティング決定木のプライベートな連合学習を実現します。垂直は差分プライバシーを用いたビン分けと暗号技術なしで、水平は分散ビン構築とセキュア集約に依存し、中央集権レベルの精度と4–5オーダーオブマグニチュードの速度向上を達成します。

ABSTRACT

Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this paper, we propose a framework named FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the ordering of the data instead of the values. We fully implement FederBoost and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal FederBoost achieve the same level of accuracy with centralized training where all data are collected in a central server, and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.

研究の動機と目的

GDPRのような制約の下でデータ共有なしのプライベートな協力を促進する。
垂直および水平データ分割の両方に対応する GBDT-focused FL フレームワークを開発する。
訓練の際の精度とプライバシーを保つ一方で暗号オーバーヘッドを最小化する。
産業導入に適した実用的でスケーラブルな実装を提供する。

提案手法

垂直 FederBoost は特徴値ではなくサンプルの順序を用いて GBDT を訓練し、順序情報をマスクするためにビン分けと差分プライバシーを用いる。
水平 FederBoost は分散ビン構築法を導入し、セキュア集約を用いて生データを漏らすことなく各ビンの勾配を計算する。
GBDT 訓練は一次および二次の勾配とサンプルの順序に依存し、生の特徴値にアクセスすることなく最適な分割を見つける。
垂直設定には局所的にプライベートなビン分け機構を介して差分プライバシーを取り入れている。
完全なプロトコルスイート (Protocol 2–5) は、両方のデータ分割に対して訓練、集約、および分位数計算を実装する。）

実験結果

リサーチクエスチョン

RQ1FederBoost は暗号運用を用いずに垂直分割データ上で GBDT モデルを訓練し、プライバシーを保持できるか？
RQ2FederBoost は水平分割データ上で軽量なセキュア集約と分散ビン構築を用いて、機密情報を漏らすことなく GBDT モデルを訓練できるか？
RQ3FederBoost は最新の連合決定木手法と比較して、中央集権レベルの精度と大幅な效率向上を達成するか？
RQ4垂直 FederBoost における差分プライバシーのパラメータがモデルの有用性に与える影響は何か？

主な発見

垂直 FederBoost は差分プライバシーノイズとビン分けを加えたにもかかわらず、中央集権トレーニングと同等の精度を達成する。
水平 FederBoost は、軽量なセキュア集約と分散ビン構築を用いながら、中央集権に類似した精度を達成する。
垂直および水平の両方の FederBoost は、最先端の連合決定木訓練法よりも4–5オーダーオブマグニチュードの高速化を達成する。
著者は最大32ノードのクラスタで動作可能な完全な実装を提供する。
垂直設定におけるプライバシーバジェットと有用性のバランスをとるための局所 DP および要素レベル DP の新しいバリアントを提案する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。