QUICK REVIEW

[论文解读] Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection

Aydin Abadi, B. A. Doyle|arXiv (Cornell University)|Jan 19, 2024

Privacy-Preserving Technologies in Data被引用 6

一句话总结

Starlit 是一种可扩展的隐私保护联邦学习机制，适用于纵向和横向数据分区，在不依赖全同态加密或固定账户冻结假设的情况下提升金融欺诈检测能力。

ABSTRACT

Federated Learning (FL) is a data-minimization approach enabling collaborative model training across diverse clients with local data, avoiding direct data exchange. However, state-of-the-art FL solutions to identify fraudulent financial transactions exhibit a subset of the following limitations. They (1) lack a formal security definition and proof, (2) assume prior freezing of suspicious customers' accounts by financial institutions (limiting the solutions' adoption), (3) scale poorly, involving either $O(n^2)$ computationally expensive modular exponentiation (where $n$ is the total number of financial institutions) or highly inefficient fully homomorphic encryption, (4) assume the parties have already completed the identity alignment phase, hence excluding it from the implementation, performance evaluation, and security analysis, and (5) struggle to resist clients' dropouts. This work introduces Starlit, a novel scalable privacy-preserving FL mechanism that overcomes these limitations. It has various applications, such as enhancing financial fraud detection, mitigating terrorism, and enhancing digital health. We implemented Starlit and conducted a thorough performance analysis using synthetic data from a key player in global financial transactions. The evaluation indicates Starlit's scalability, efficiency, and accuracy.

研究动机与目标

动员在金融机构之间进行协作、隐私保护分析以提升欺诈检测的需求。
解决现有联邦学习解决方案的局限性，如安全性证明、可扩展性，以及对 dropout（客户端掉线）和身份对齐的处理。
提出一个正式的安全定义和一个实用、可扩展的协议（Starlit），适用于纵向和横向分区数据。
展示 Starlit 在现实世界的金融欺诈场景以及其他领域如恐怖主义缓解和数字健康中的适用性。

提出的方法

引入 Starlit，一种两阶段的隐私保护联邦学习机制，使用 Feature Collector 将训练简化为两方纵向联邦学习设置。
使用 Private Set Intersection 进行身份对齐和跨共享用户特征的差异检测。
在训练前将标志值混淆，加入本地差分隐私保护。
在纵向联邦学习框架中利用 SecureBoost 进行训练，且不需要全同态加密。
提供一个正式的安全定义（Celestial），对被动对手提供基于仿真的安全性。
在 Flower 中实现并使用合成数据评估 Starlit，以评估可扩展性、效率和准确性。

实验结果

研究问题

RQ1Starlit 能否在不暴露敏感输入的前提下，安全地实现多方纵向和横向数据协作以进行欺诈检测？
RQ2Starlit 如何实现随参与方数量线性扩展的可扩展性并抵御客户端掉线？
RQ3在训练过程中，Private Set Intersection 与 Local Differential Privacy 的作用与影响是什么？
RQ4SecureBoost 如何整合到 Starlit 框架中以实现高效、加密的模型训练？
RQ5在基于仿真的模型（Celestial）下，Starlit 能建立哪些正式的安全保障？

主要发现

Starlit 设计为随参与者数量线性扩展，并且不依赖全同态加密。
该框架能够安全地识别共享特征中的差异，并在不同客户端之间对同一用户的共同特征进行聚合。
一个包含特征提取和训练的两阶段过程，以及第三方 Feature Collector，使纵向和横向分区数据的可扩展训练成为可能。
对 flag 应用本地差分隐私，以降低在模型训练过程中的推断风险。
Starlit 使用来自一家全球主要金融交易机构的合成数据实现和评估，显示出可扩展性、效率和准确性方面的益处。
该工作提供了正式的基于仿真的安全定义（Celestial）并分析了在（V）FL 工作流中的信息泄露。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。