QUICK REVIEW

[논문 리뷰] FederBoost: Private Federated Learning for GBDT

Zhihua Tian, Rui Zhang|arXiv (Cornell University)|2020. 11. 05.

Privacy-Preserving Technologies in Data참고 문헌 29인용 수 34

한 줄 요약

FederBoost는 수직 및 수평 데이터 파편화에 대해 프라이빗 페더레이티드 학습을 통한 그래디언트 부스팅 결정 트리(GBDT)를 가능하게 한다. 수직은 차등 프라이버시를 이용한 버킷화와 암호학 없이 작동하며; 수평은 분산 버킷 구성과 안전한 집계에 의존하여 중앙 수준의 정확도와 4–5 orders of magnitude의 속도 향상을 달성한다.

ABSTRACT

Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this paper, we propose a framework named FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the ordering of the data instead of the values. We fully implement FederBoost and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal FederBoost achieve the same level of accuracy with centralized training where all data are collected in a central server, and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.

연구 동기 및 목표

렴전 데이터 공유 없이 GDPR 유사 제약 하에서 프라이빗 협업을 촉진한다.
GBDT 중심의 FL 프레임워크를 수직 및 수평 데이터 파편화에 모두 작동하도록 개발한다.
훈련에서의 정확성과 프라이버시를 유지하면서 암호학적 오버헤드를 최소화한다.
산업 배치를 위한 실용적이고 확장 가능한 구현을 제공한다.

제안 방법

Vertical FederBoost는 GBDT를 학습하기 위해 피처 값이 아닌 샘플 순서를 사용하며, 순서 정보를 마스킹하기 위한 버킷화 및 차등 프라이버시를 적용한다.
Horizontal FederBoost는 분산된 버킷 구성 방법을 도입하고 안전한 집계를 사용하여 원시 데이터를 노출하지 않고 버킷별 그래디언트를 계산한다.
GBDT 학습은 원시 피처 값을 접근하지 않고 최적의 분할을 찾기 위해 1차/2차 그래디언트와 샘플 순서를 활용한다.
수직 설정에서 로컬로 비공개 버킷화 메커니즘을 통해 차등 프라이버시를 통합한다.
전체 프로토콜 모음(프로토콜 2–5)이 두 데이터 파편에 대해 학습, 집계, 분위수 계산을 구현한다.

실험 결과

연구 질문

RQ1FederBoost가 암호학적 연산 없이 수직으로 분할된 데이터에서 프라이버시를 유지하며 GBDT 모델을 학습할 수 있는가?
RQ2FederBoost가 경량 보안 집계와 분산 버킷 구성으로 수평으로 분할된 데이터에서 원시 정보 누출 없이 GBDT를 학습할 수 있는가?
RQ3FederBoost가 중앙 집중 수준의 정확도와 최첨단 연합된 결정 트리 방법 대비 상당한 효율성 개선을 달성하는가?
RQ4수직 FederBoost에서 차등 프라이버시 매개변수가 모델 유용성에 미치는 영향은 무엇인가?

주요 결과

수직 FederBoost는 DP 노이즈와 버킷화가 추가되었음에도 중앙 집중식 학습과 비교 가능한 정확도를 달성한다.
수평 FederBoost는 경량 보안 집계 및 분산 버킷 구성 사용으로 중앙 집중형 유사 정확도를 달성한다.
두 방식 모두 최신 연합 페더레이티드 결정 트리 학습 방법에 비해 4–5개의 차원 차이의 속도 향상을 달성한다.
저자들은 32노드까지의 클러스터에서 작동할 수 있는 완전한 구현을 제공한다.
수직 설정에서 프라이버시 예산과 유틸리티의 균형을 맞추기 위한 로컬 DP 및 원소 수준 DP의 새로운 변형을 제안한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.