Skip to main content
QUICK REVIEW

[论文解读] Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions

Thippa Reddy Gadekallu, Quoc‐Viet Pham|arXiv (Cornell University)|Oct 8, 2021
Privacy-Preserving Technologies in Data参考文献 121被引用 42
一句话总结

本综述分析联邦学习如何通过回顾其在大数据服务(获取、存储、分析、隐私)和应用(智慧城市、医疗保健、交通、电网、社交媒体)的使用来应对大数据挑战,以及面临的挑战与未来方向。

ABSTRACT

In the recent years, generation of data have escalated to extensive dimensions and big data has emerged as a propelling force in the development of various machine learning advances and internet-of-things (IoT) devices. In this regard, the analytical and learning tools that transport data from several sources to a central cloud for its processing, training, and storage enable realization of the potential of big data. Nevertheless, since the data may contain sensitive information like banking account information, government information, and personal information, these traditional techniques often raise serious privacy concerns. To overcome such challenges, Federated Learning (FL) emerges as a sub-field of machine learning that focuses on scenarios where several entities (commonly termed as clients) work together to train a model while maintaining the decentralisation of their data. Although enormous efforts have been channelized for such studies, there still exists a gap in the literature wherein an extensive review of FL in the realm of big data services remains unexplored. The present paper thus emphasizes on the use of FL in handling big data and related services which encompasses comprehensive review of the potential of FL in big data acquisition, storage, big data analytics and further privacy preservation. Subsequently, the potential of FL in big data applications, such as smart city, smart healthcare, smart transportation, smart grid, and social media are also explored. The paper also highlights various projects pertaining to FL-big data and discusses the associated challenges related to such implementations. This acts as a direction of further research encouraging the development of plausible solutions.

研究动机与目标

  • 介绍联邦学习(FL)和大数据的基础,以及把两者整合的动机。
  • 回顾在大数据服务中的FL应用:获取、存储、分析与隐私保护。
  • 调研在垂直领域中由FL驱动的大数据应用(智慧城市、智能医疗、智能交通、智能电网、社交媒体)。
  • 总结实际的FL-大数据项目并讨论关键挑战及潜在方向。

提出的方法

  • 按数据属性(水平、垂直、联邦迁移学习)和网络拓扑(集中式、分布式)对FL进行分类。
  • 描述大数据分类体系(数据域、存储、计算、基于AI的处理)及整合动机(隐私、通信、异质性、多样性、分析、可扩展性)。
  • 回顾FL在大数据服务中的作用:获取、存储、分析与隐私保护,并给出前沿示例。
  • 调查FL驱动的垂直应用并引用代表性项目以说明其实用性。
  • 识别挑战(通信瓶颈、数据异质性、统计异质性、隐私问题)并提出未来研究方向。

实验结果

研究问题

  • RQ1在大数据环境中,联邦学习如何应对安全、隐私和可扩展性挑战?
  • RQ2哪些大数据服务(获取、存储、分析、隐私保护)可以从FL获益,以及它们如何实现?
  • RQ3哪些垂直应用(如智慧城市、医疗保健、交通、电网、社交媒体)体现了FL-大数据的可行性?
  • RQ4在大数据领域的FL中有哪些关键的未解挑战以及可行的克服方向。

主要发现

  • FL通过传输模型参数而非原始数据,在分散数据上训练模型,从而降低隐私风险。
  • 通过共享训练后的模型更新而不是原始大数据来降低通信成本。
  • FL通过在边缘设备本地对数据类型进行定制处理来适应数据多样性和异质性。
  • FL在工业物联网和智慧系统中支持可扩展的分析和隐私保护的数据管理。
  • 本综述涵盖大数据服务(获取、存储、分析、隐私)与应用(智慧城市、医疗保健、交通、电网、社交媒体),并讨论实际项目与挑战。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。