QUICK REVIEW

[论文解读] To tune or not to tune the number of trees in random forest?

Philipp Probst, Anne‐Laure Boulesteix|arXiv (Cornell University)|May 16, 2017

Machine Learning and Data Classification被引用 107

一句话总结

这篇论文在理论和实证上表明，对于分类，随机森林的期望错误率在树的数量T增加时可以呈现非单调性，而Brier分数、对数损失以及回归的均方误差在T上是单调的；它反对调节T，并建议使用一个大型、在计算上可行的T。

ABSTRACT

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

研究动机与目标

解决在随机森林中是否应调优树的数量 T，还是将其设置为一个较大且可行的值。
给出期望错误率随 T 增大时的理论特征描述。
在大量数据集上对非单调错误率模式的普遍性进行实证评估。
就实际的 T 选择提供指导，并介绍用于评估收敛性的 OOBCurve 工具。

提出的方法

推导期望性能度量（错误率、Brier 分数、对数损失）关于 T 的理论表达式，使用每个观测的预测难度 ε_i。
表明在分类中，错误率对 T 可以呈现非单调性，而 Brier 分数和对数损失在 T 上严格单调下降，AUC 可能非单调。
分析 AUC 的行为并将模型适应于袋外（OOB）误差情景。
在 OpenML 的 193 个分类任务和 113 个回归任务上，以 2000 棵树和 1000 个随机种子进行大规模实证研究，以观察 OOB 曲线。
提供一个 R 包 OOBCurve，用于计算多种度量的 OOB 曲线。

实验结果

研究问题

RQ1在某些数据条件下，作为树的数量 T 的函数，期望的分类错误率是单调的还是可能非单调？
RQ2其他性能度量（Brier 分数、对数损失、MSE、AUC）是否在 T 上表现出单调性？在何种情况下？
RQ3在真实数据中，非单调错误率模式的普遍程度如何，哪些数据集特征能够预测它们？
RQ4实践者应调优 T 还是基于收敛性特征仅使用一个较大、在计算上可行的 T？
RQ5OOBCurve 工具是否有助于评估收敛性并指导 T 的选择？

主要发现

对于某些观测，期望的分类错误率在 T 上可以非单调，导致跨数据集的平均错误曲线也非单调。
对于二元分类，Brier 分数和对数损失在平均意义上随 T 严格下降，而 AUC 在某些情况下可能非单调。
对于回归，均方误差随 T 减小，而一些基于中位数的误差在某些区域可能呈现非单调性。
实证显示大约 10% 的 OpenML 数据集显示非单调的 OOB 错误率曲线，通常是因为 ε_i 值接近 0.5 驱动这种效应。
非单调模式在小数据集上更常见；在 2000 棵树时观察到 OOB 曲线的更大收敛。
研究支持推荐使用一个在计算上可行的较大 T，而不是调优 T，并辅以所需性能度量的收敛诊断。
引入一个 R 包 OOBCurve，用于计算多种性能度量的 OOB 曲线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。