QUICK REVIEW

[论文解读] Optimization of Tree Ensembles

Velibor V. Mišić|arXiv (Cornell University)|May 30, 2017

Advanced Multi-Objective Optimization Algorithms被引用 5

一句话总结

本文提出了一种混合整数优化（MIO）框架，用于求解树集成优化问题，其中可控输入变量被设定为最大化随机森林或提升树的预测结果。该方法采用紧致的MIO公式、Benders分解以及迭代分裂约束生成，以高效地找到近似最优解，在药物设计和定价案例研究中，其最优性间隙低于1%。

ABSTRACT

Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.

研究动机与目标

解决当输入变量为可控决策变量而非外生预测变量时，树集成模型的优化挑战。
开发一种严格的数学公式，以可优化的方式捕捉树预测的分段常数特性。
设计可扩展的求解方法——Benders分解与迭代分裂约束生成，以处理大规模树集成。
在真实世界数据上实证验证该方法，包括药物设计与定制化定价，证明其在性能上优于启发式方法。
通过基于深度截断的MIO公式，建立近似质量的理论边界。

提出的方法

将树集成优化表述为混合整数优化（MIO）问题，通过二值变量和逻辑约束建模每棵树的决策路径。
提出一种强健的MIO公式，紧密建模输入变量到树叶的映射关系，确保高求解质量。
设计基于深度截断的近似公式层级，提供近似误差的可证明边界。
开发基于Benders分解的算法，将问题分解并高效求解大规模实例。
设计一种迭代分裂约束生成方法，根据树分裂动态添加约束，提升收敛性。
采用邻近性约束，确保解与训练数据点足够不同，增强实际新颖性。

实验结果

研究问题

RQ1混合整数优化公式能否有效建模树集成的分段常数预测函数，以支持决策优化？
RQ2所提出的MIO公式在公式强度与求解质量方面，相较于其他公式表现如何？
RQ3基于树深度的近似层级在多大程度上能限制最优性间隙，同时降低计算复杂度？
RQ4分解与约束生成方法能否在真实应用中扩展至大规模树集成？
RQ5在实际中，MIO优化与启发式方法相比，在目标值与解多样性方面表现如何？

主要发现

所提出的MIO公式在所有测试实例中均能持续获得近似最优解，最大最优性间隙仅为0.12%。
在药物设计案例研究中，MIO方法找到的分子与训练数据的最大邻近度为0.01，实现了最佳可能目标值的93%。
在定制化定价案例研究中，MIO优化的价格实现了显著更高的样本外R²（表明预测准确性更优），优于分层贝叶斯模型。
MIO解的极端程度低于基于启发式方法的价格，全连锁店中被设定为最高或最低允许价格的产品更少。
Benders分解与分裂约束生成方法使大规模实例的求解效率显著提升，问题在数秒至数分钟内解决。
在目标值与与训练数据的邻近度方面，MIO方法全面优于启发式解，启发式解仅达到最优目标值的90%–94%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。