QUICK REVIEW

[论文解读] Benchmarking Automatic Machine Learning Frameworks

Adithya Balaji, Alexander G. Allen|arXiv (Cornell University)|Aug 17, 2018

Cloud Computing and Resource Management参考文献 2被引用 60

一句话总结

这篇论文在 OpenML 数据集上对开源 AutoML 框架（auto-sklearn、TPOT、auto_ml、H2O AutoML）进行基准测试，比较分类与回归的性能模式。

ABSTRACT

AutoML serves as the bridge between varying levels of expertise when designing machine learning systems and expedites the data science process. A wide range of techniques is taken to address this, however there does not exist an objective comparison of these techniques. We present a benchmark of current open source AutoML solutions using open source datasets. We test auto-sklearn, TPOT, auto_ml, and H2O's AutoML solution against a compiled set of regression and classification datasets sourced from OpenML and find that auto-sklearn performs the best across classification datasets and TPOT performs the best across regression datasets.

研究动机与目标

评估领先的开源 AutoML 框架在多样化数据集上的性能。
使用 OpenML 数据和标准化指标提供公平、可重复的基准方法。
识别 AutoML 部署在时间受限环境下的优点、弱点与实际注意事项。

提出的方法

基准测试固定集合的 57 个分类和 30 个回归 OpenML 数据集。
标准化框架配置以实现公平比较（时间限制、随机种子、评估指标）。
以加权 F1-score 作为分类的主要指标，回归使用 MSE（出于可比性取倒数）作为主要指标。
使用分布式计算环境（AWS Batch 和 bare-metal）以应对巨大的计算需求。
记录数据预处理要求和框架特定限制。
分析成对框架性能及数据集依赖趋势。

实验结果

研究问题

RQ1哪个 AutoML 框架在 OpenML 数据集上的分类任务上取得最高性能？
RQ2哪个框架在回归任务上表现最佳？
RQ3框架性能如何随数据集特征（如规模、维数、类别分布）而变化？
RQ4在分布式环境中扩展 AutoML 框架时，会出现哪些实际限制与故障？

主要发现

auto-sklearn 在分类数据集上表现最佳。
TPOT 在回归数据集上表现最佳。
整体结果在不同数据集和随机种子上方差较大，凸显代码库和特征集的影响。
AutoML 框架在某些内存和时间约束下显示出显著失败，需要谨慎的资源管理。
数据集和随机种子对结果影响显著，要求健壮的基准设计。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。