QUICK REVIEW

[论文解读] Faster Discovery of Faster System Configurations with Spectral Learning

Vivek Nair, Tim Menzies|arXiv (Cornell University)|Jan 27, 2017

Software Engineering Research参考文献 37被引用 25

一句话总结

该论文提出了WHAT，一种谱学习方法，通过利用软件配置之间距离矩阵的特征值实现降维，从而在仅需少量样本（仅几十个）的情况下实现准确且稳定的性能预测，远少于以往方法。相比最先进技术，该方法将样本需求减少了2–10倍，同时预测误差低于10%，标准差低于2%。

ABSTRACT

Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT's innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors - less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2 to 10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance.

研究动机与目标

为在大规模复杂软件系统中以最少采样量发现性能最优配置提供解决方案。
克服以往方法所需样本过多或预测方差过高的局限性。
开发一种可扩展、稳定且准确的方法，仅通过少量代表性配置即可预测系统性能。
通过生成可靠、低方差的预测器，实现对代理模型在优化中的高效使用。
通过第一主成分的谱聚类方法，证明其在采样配置空间中的有效性。

提出的方法

该方法利用配置间距离矩阵的谱（特征值）进行谱降维。
通过第一主成分的近似值递归划分配置空间，识别出相似配置的聚类。
基于配置的谱特性，从每个聚类中选择少量代表性配置，以最小化采样成本。
该方法利用配置空间的内在低维性（通过相关维数衡量）来指导采样。
采用基于距离的相似性度量（欧氏距离）对配置进行分组，并识别出有信息量的样本。
最终生成的模型用作现成优化器的代理预测器，以寻找近似最优配置。

实验结果

研究问题

RQ1谱学习能否在保持或提升准确性的前提下，减少可配置软件系统中性能预测所需的样本数量？
RQ2使用配置空间距离矩阵的第一主成分如何提升采样效率与预测稳定性？
RQ3所生成的预测模型在多大程度上可被标准优化算法有效利用，以发现近似最优配置？
RQ4配置空间的内在维度如何影响谱采样策略的性能？
RQ5所提出方法在样本效率与预测方差方面能否超越最先进方法？

主要发现

WHAT在六个真实系统中仅使用几十个样本，即实现低于10%的预测误差和低于2%的标准差。
该方法相比最先进方法将样本需求减少了2至10倍，同时保持或提升了预测准确性。
WHAT生成的预测模型具有稳定性和有效性，使现成优化器在所有测试系统中均能发现近似最优配置。
配置空间的内在维度（通过相关维数衡量）较低，这解释了该方法表现出色的原因。
WHAT在准确性和稳定性方面优于Siegmund等人和Guo等人，在准确性上与Sarkar等人相当，但样本使用量显著更少。
该方法在包括Berkeley DB、Apache、SQLite、LLVM和x264在内的多种系统中表现稳健，展现出广泛的适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。