QUICK REVIEW

[论文解读] DataPerf: Benchmarks for Data-Centric AI Development

Mark Mazumder, Colby Banbury|arXiv (Cornell University)|Jul 20, 2022

Machine Learning and Data Classification被引用 51

一句话总结

DataPerf 引入一个由社区驱动的基准套件，用于在多模态范围内评估数据中心型 AI 与数据中心型算法，托管在具可扩展基准与长期维护的在线平台上。首个版本涵盖语音与视觉数据选择、数据清洗、数据获取和提示，并提供开源基线。

ABSTRACT

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.

研究动机与目标

将机器学习基准测试从模型转向数据质量和数据中心化开发实践。
提供一个可扩展的开放平台，用于评估数据中心化的流水线和数据集。
通过工作组和长期管理促进社区贡献。
展示跨模态的实际数据中心化任务及真实世界用例。

提出的方法

开发一个在线平台（Dynabench），并与 MLCommons 集成以托管数据中心基准。
扩展平台以接受多样化的提交制品（训练子集、容器化系统等）。
定义五个初始基准（speech data selection、vision data selection、debugging、data acquisition、adversarial Nibbler），在固定的模型设置下进行公平的数据中心比较。
提供基线实现和公开榜单，以实现可重复性和进展跟踪。
通过 MLCommons 下的专门工作组维护 DataPerf，以进行持续的基准开发与可持续性。

实验结果

研究问题

RQ1如何设计基准以在独立于模型变更的前提下评估数据中心化改进？
RQ2在固定的模型结构与预算下，哪些数据中心化技术能带来最大收益？
RQ3在线平台如何在规模化并具可重复评估的前提下，支持多样化的数据中心挑战？
RQ4哪些实际用例能最好地展示跨模态数据中心化 AI 的收益？
RQ5数据获取、清洗与选择策略在效果与成本方面的比较？

主要发现

DataPerf 提供一个可扩展的开源平台（Dynabench）以及通过 MLCommons 的长期治理模型，用于可持续的数据中心基准测试。
初始套件涵盖多样的数据中心任务——speech and vision data selection、debugging、data acquisition、adversarial prompting——展示了数据中心化开发在模型优化之外的广度。
基线结果和演示显示数据市场和任务之间存在异质性，强调了精心设计数据中心策略的价值。
离线评估脚本和容器化提交制品减少在线计算需求，提升参与者的可访问性。
专门的 DataPerf 工作组协调持续的基准开发、社区贡献和平台维护，目标在学术界和工业界实现长期影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。