QUICK REVIEW

[论文解读] AIBench: An Industry Standard AI Benchmark Suite from Internet Services.

Fei Tang, Wanling Gao|arXiv (Cornell University)|Apr 30, 2020

Explainable Artificial Intelligence (XAI)被引用 5

一句话总结

AIBench 是一个全面的、面向工业标准的 AI 基准测试套件，基于真实互联网服务工作负载开发而成，包含 17 个代表性 AI 任务，以确保多样性和代表性。通过精心挑选的最小子集，将基准测试成本降低了 41%，同时保留了关键工作负载特征，在模型复杂度、计算模式和热点分析方面优于 MLPerf。

ABSTRACT

The booming successes of machine learning in different domains boost industry-scale deployments of innovative AI algorithms, systems, and architectures, and thus the importance of benchmarking grows. However, the confidential nature of the workloads, the paramount importance of the representativeness and diversity of benchmarks, and the prohibitive cost of training a state-of-the-art model mutually aggravate the AI benchmarking challenges. In this paper, we present a balanced AI benchmarking methodology for meeting the subtly different requirements of different stages in developing a new system/architecture and ranking/purchasing commercial off-the-shelf ones. Performing an exhaustive survey on the most important AI domain-Internet services with seventeen industry partners, we identify and include seventeen representative AI tasks to guarantee the representativeness and diversity of the benchmarks. Meanwhile, for reducing the benchmarking cost, we select a benchmark subset to a minimum-three tasks-according to the criteria: diversity of model complexity, computational cost, and convergence rate, repeatability, and having widely-accepted metrics or not. We contribute by far the most comprehensive AI benchmark suite-AIBench. The evaluations show AIBench outperforms MLPerf in terms of the diversity and representativeness of model complexity, computational cost, convergent rate, computation and memory access patterns, and hotspot functions. With respect to the AIBench full benchmarks, its subset shortens the benchmarking cost by 41%, while maintaining the primary workload characteristics. The specifications, source code, and performance numbers are publicly available from the web site this http URL.

研究动机与目标

解决工业规模 AI 系统开发中对代表性与多样化 AI 基准测试日益增长的需求。
克服机密工作负载、高昂训练成本以及基准测试可重复性带来的挑战。
开发一个既支持系统开发又可用于商业系统排名的基准测试套件。
确保对模型复杂度、计算成本、收敛速率和内存访问模式的广泛覆盖。
在不牺牲主要工作负载特征保真度的前提下，最小化基准测试成本。

提出的方法

对 17 家行业合作伙伴进行全面调查，从互联网服务中识别出 17 个代表性 AI 任务。
基于模型复杂度、计算成本、收敛速率和可重复性等方面的多样性，选择至少三个任务的最小子集。
优先选择具有广泛接受指标的基准测试，以确保一致性和可比性。
设计基准测试套件以捕捉关键计算和内存访问模式，包括热点函数。
确保通过公共网站公开规范、源代码和性能数据。
将 AIBench 与 MLPerf 进行对比评估，以证明其在代表性与工作负载特征多样性方面的优越性。

实验结果

研究问题

RQ1如何设计一个 AI 基准测试套件，以真实反映现实互联网服务工作负载的多样性和复杂性？
RQ2哪些标准能够支持创建一个最小但具有代表性的基准子集，从而在不牺牲保真度的前提下降低成本？
RQ3AIBench 在捕捉模型复杂度和计算模式方面，与现有基准（如 MLPerf）相比优越程度如何？
RQ4该基准子集在多大程度上保留了完整基准测试套件的主要特征？
RQ5广泛接受的指标在大规模 AI 基准测试中，对确保可重复性和可比性起到何种作用？

主要发现

AIBench 包含 17 个源自真实互联网服务工作负载的代表性 AI 任务，确保了高度的代表性与多样性。
与完整版 AIBench 套件相比，该基准子集将总基准测试成本降低了 41%。
该子集保持了主要工作负载特征，包括模型复杂度、计算成本和收敛速率。
AIBench 在捕捉模型复杂度、计算成本和内存访问模式的多样性方面优于 MLPerf。
AIBench 对热点函数和对系统评估至关重要的计算模式提供了更全面的覆盖。
规范、源代码和性能数据对公众开放，以支持可复现性和社区使用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。