[论文解读] Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers
该论文将树集成重新设定为自适应平滑器,以量化它们的平滑行为,展示随机化如何自我正则化预测并在不仅仅偏差-方差解释的基础上提升性能。
Despite their remarkable effectiveness and broad application, the drivers of success underlying ensembles of trees are still not fully understood. In this paper, we highlight how interpreting tree ensembles as adaptive and self-regularizing smoothers can provide new intuition and deeper insight to this topic. We use this perspective to show that, when studied as smoothers, randomized tree ensembles not only make predictions that are quantifiably more smooth than the predictions of the individual trees they consist of, but also further regulate their smoothness at test-time based on the dissimilarity between testing and training inputs. First, we use this insight to revisit, refine and reconcile two recent explanations of forest success by providing a new way of quantifying the conjectured behaviors of tree ensembles objectively by measuring the effective degree of smoothing they imply. Then, we move beyond existing explanations for the mechanisms by which tree ensembles improve upon individual trees and challenge the popular wisdom that the superior performance of forests should be understood as a consequence of variance reduction alone. We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles -- because the prevailing definition of bias does not capture differences in the expressivity of the hypothesis classes formed by trees and forests. Instead, we show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled. In particular, we demonstrate that the smoothing effect of ensembling can reduce variance in predictions due to noise in outcome generation, reduce variability in the quality of the learned function given fixed input data and reduce potential bias in learnable functions by enriching the available hypothesis space.
研究动机与目标
- 通过将树集成视为自适应平滑器、对训练标签进行平均来直观解释为什么树集成会成功。
- 量化平滑的程度(有效自由度)并比较训练时与测试时的行为。
- 在统一的平滑框架下调和森林成功的两个最新解释(尖峰-平滑插值与随机化作为正则化)。
- 研究偏差和方差的概念如何未能充分捕捉森林的表达力,并识别三种不同的改进机制。
- 在经验层面验证基于平滑的解释并评估超越方差减小的机制。
提出的方法
- 将树和集成表示为自适应、结果相关的平滑器,具有平滑权重 sTheta(x0) 与集成权重 wb。
- 使用有效参数度量 p0_s_hat 来量化训练输入与测试输入上的平滑(式(6))
- 通过改变树构造中的随机性和集成规模来比较插值森林与非插值集成。
- 分析训练时和测试时的行为,以展示集成在未见输入上可能更平滑而非训练数据上。
- 通过对平滑效应的解释,将尖峰-平滑插值与将随机化视为正则化联系起来。
- 通过模拟(MARSadd 设置)进行经验验证,并在真实数据集上复现实验分析(附录 C)。
实验结果
研究问题
- RQ1如何将树集成解读为自适应平滑器,这对其预测行为意味着什么?
- RQ2对于树和森林,有效平滑参数 p0_s_hat 在训练输入和测试输入之间有何差异?
- RQ3随机化和集成规模是否能够在超越传统偏差-方差解释的同时降低平滑引起的方差并提升泛化?
- RQ4能否将 Wyner 等人提出的尖峰-平滑解释与 Mentch 和 Zhou 的随机化作为正则化的观点调和?
- RQ5森林在超越方差减少的情况下,相较单棵树有哪些不同的改进机制?
主要发现
- 插值式森林集成在未见的测试数据上使用的有效参数更少于训练数据,证明了尖峰-平滑行为。
- 增加集成的随机性和规模会在未见输入上带来更高的平滑(测试数据上的 p0_s_hat 更低)。
- 平滑视角揭示,在测试输入下森林可以比单棵树更平滑,尤其是在输入信息不足时。
- Mentch 和 Zhou 的自由度度量单独不足以解释森林的优势;p0_s_hat 指标提供更完整的解释。
- 森林相对于树的提升来自三种机制:平滑降低了来自噪声结果的方差、在给定固定数据时降低了学习到的函数质量的变异性、并丰富了假设空间以降低潜在偏差。
- 实证结果显示,样本内预测从降低结果噪声方差中受益,而对未见输入的泛化则受益于跨输入的不同平滑行为。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。