QUICK REVIEW

[论文解读] Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge

Grigori Fursin|arXiv (Cornell University)|Sep 14, 2017

Parallel Computing and Optimization Techniques被引用 2

一句话总结

本文提出了一种协作式、开源的框架——Collective Mind（cM），通过自动化自调优和知识共享，实现持续的、面向性能与成本的软件工程。通过在软件中加入轻量级包装器并将其连接至公共存储库，该系统能够通过Pareto前沿分析，在多种硬件上追踪优化权衡，实现实时、多目标调优，从而提升编译器效率与应用可移植性。

ABSTRACT

The original presentation was shared via SlideShare. Slides from the ARM's Research Summit'17 about the "Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack": cKnowledge.org cKnowledge.org/repo cKnowledge.org/repo-beta cKnowledge.org/android-apps.html cKnowledge.org/ai developer.arm.com/research/summit Co-designing the whole AI/SW/HW stack in terms of speed, accuracy, energy consumption, size, costs, and other metrics has become extremely complex, long and costly. With no rigorous methodology for analyzing performance and accumulating optimisation knowledge, we are simply destined to drown in the ever growing number of design choices, system features and conflicting optimisation goals. We present our novel community-driven approach to solve the above problems. Originating from natural sciences, this approach is embodied in Collective Knowledge (CK), our open-source cross-platform workflow framework and repository for automatic, collaborative and reproducible experimentation. CK helps organize, unify and share representative workloads, data sets, AI frameworks, libraries, compilers, scripts, models and other artifacts as customizable and reusable components with a common JSON API. CK helps bring academia, industry and end-users together to gradually expose optimisation choices at all levels (e.g. from parameterized models and algorithmic skeletons to compiler flags and hardware configurations) and autotune them across diverse inputs and platforms. Optimization knowledge gets continuously aggregated in public or private repositories such as cKnowledge.org/repo in a reproducible way, and can be then mined and extrapolated to predict better AI algorithm choices, compiler transformations and hardware designs. We also demonstrate how we use this approach in practice together with ARM and other companies to adapt to a Cambrian AI/SW/HW explosion by creating an open repository of reusable AI artifacts, and then collaboratively optimising and co-designing the whole deep learning stack (software, hardware and models).

研究动机与目标

应对在异构、快速演进的硬件平台上优化软件日益增长的挑战。
克服传统编译器因设计空间庞大且未被充分探索而无法利用优化机会的局限性。
构建可扩展的、社区驱动的基础设施，用于在软件与硬件配置之间共享和重用优化知识。
通过持续基准测试与集体实验，实现实用、可复现且可持续的性能工程。
通过基于真实世界数据与机器学习的预测，支持自调优系统以确定最优配置。

提出的方法

为软件组件（‘计算物种’）开发轻量级包装器，以暴露编译器标志、线程模型和算法变体等优化参数。
将包装器与公共 Collective Mind（cM）自调优基础设施及存储库集成，以在真实硬件上收集性能与成本度量数据。
利用众包方式在 Android 设备和通用硬件上持续基准测试与调优软件，确保在真实条件下的表现。
应用 Paret o 前沿分析，识别在多个目标（如执行时间、能耗、内存、代码大小）之间的最优权衡。
支持手动与自动对优化解决方案进行分类、剪枝与相关性分析，关联软件特性、输入数据与硬件特征。
利用基于 JSON 的 API 与大数据分析，支持可扩展、可重用且互操作的性能追踪与知识共享。

实验结果

研究问题

RQ1如何将软件工程转变为类似自然科学的持续性、数据驱动过程？
RQ2社区驱动的、开源的基础设施能否有效追踪并优化多样化硬件与配置下的软件性能？
RQ3如何实现多目标自调优的规模化与自动化，以应对现代系统日益增长的复杂性？
RQ4协作式知识共享与机器学习在提升编译器优化与系统可靠性方面发挥何种作用？
RQ5能否利用真实世界的大规模性能数据预测最优配置，从而减少调优工作量？

主要发现

Collective Mind 框架成功收集了超过 15,000 个数据集与 300 种软件物种，覆盖多种硬件平台，证明了其可扩展性与真实世界适用性。
工业合作伙伴报告称，GCC 编译器在 ARM 与英特尔处理器上的优化启发式策略得到可度量的改进，包括在硬件验证期间检测到架构错误。
一个静态编译的图像处理应用被成功转换为自调优系统，在满足实时约束的同时最小化能耗、内存占用与开发成本。
该框架支持持续、可复现的基准测试，使研究人员能够在大规模、真实且具有代表性的软硬件基础平台上验证新型优化技术。
与 Docker、Phoronix 和 Eclipse 等工具的集成增强了自动化与采用率，支持无缝的依赖关系追踪与插件集成。
该方法通过促进成果物评估并减少对非代表性或合成基准的依赖，支持软件工程的长期可持续性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。