QUICK REVIEW

[논문 리뷰] Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers

Alicia Curth, Alan Jeffares|arXiv (Cornell University)|2024. 02. 02.

Neural Networks and Applications인용 수 6

한 줄 요약

이 논문은 트리 앙상블을 적응형 스무더로 재구성하여 그들의 스무딩 동작을 정량화하고, 무작위화가 예측을 자체적으로 규제하며 단순한 바이어스-분산 설명을 넘어 성능을 향상시킨다는 것을 보여준다.

ABSTRACT

Despite their remarkable effectiveness and broad application, the drivers of success underlying ensembles of trees are still not fully understood. In this paper, we highlight how interpreting tree ensembles as adaptive and self-regularizing smoothers can provide new intuition and deeper insight to this topic. We use this perspective to show that, when studied as smoothers, randomized tree ensembles not only make predictions that are quantifiably more smooth than the predictions of the individual trees they consist of, but also further regulate their smoothness at test-time based on the dissimilarity between testing and training inputs. First, we use this insight to revisit, refine and reconcile two recent explanations of forest success by providing a new way of quantifying the conjectured behaviors of tree ensembles objectively by measuring the effective degree of smoothing they imply. Then, we move beyond existing explanations for the mechanisms by which tree ensembles improve upon individual trees and challenge the popular wisdom that the superior performance of forests should be understood as a consequence of variance reduction alone. We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles -- because the prevailing definition of bias does not capture differences in the expressivity of the hypothesis classes formed by trees and forests. Instead, we show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled. In particular, we demonstrate that the smoothing effect of ensembling can reduce variance in predictions due to noise in outcome generation, reduce variability in the quality of the learned function given fixed input data and reduce potential bias in learnable functions by enriching the available hypothesis space.

연구 동기 및 목표

트리 앙상블이 훈련 라벨을 평균화하는 적응형 스무더로 보는 관점으로 왜 성공하는지에 대한 직관 제공.
스무딩의 정도(유효 자유도)를 정량화하고 학습 시점과 테스트 시점의 동작을 비교한다.
포레스트 성공에 대한 두 가지 최근 설명(스파이크드-스무드 보간과 정규화로서의 무작위화)을 통합된 스무딩 프레임워크 하에 조화시킨다.
바이어스와 분산 개념이 포레스트의 표현력을 완전히 포착하지 못하는 방식 investigate and identify three distinct improvement mechanisms.
스무딩 기반 설명을 실증적으로 검증하고 분산 감소를 넘어서는 메커니즘을 평가한다.

제안 방법

트리와 앙상블을 적응적이고 결과 의존적인 스무더로, 스무딩 가중치 sTheta(x0)와 앙상블 가중치 wb를 가진 것으로 표현한다.
학습 입력과 테스트 입력에서의 스무딩을 정량화하기 위해 유효 매개변수 척도 p0_s_hat를 사용한다 (식 6).
트리 구성의 무작위성과 앙상블 개수의 변화에 따라 보간 포레스트와 비보간 앙상블을 비교한다.
학습 시점과 테스트 시점의 동작을 분석하여 보편적인 아직 보지 못한 입력에서 앙상블이 학습 데이터보다 더 부드러울 수 있음을 보여준다.
스파이크드-스무드 보간을 정규화로서의 무작위화와 관련시키고 두 가지를 스무딩 효과를 통해 해석한다.
시뮬레이션(MARSadd 설정)으로 실증 검증하고 실제 데이터셋에서 분석을 재현한다(Appendix C).

실험 결과

연구 질문

RQ1트리 앙상블을 적응형 스무더로 해석하는 방법은 무엇이며 이것이 예측 동작에 대해 무엇을 시사하는가?
RQ2트리와 포레스트에서 학습 입력과 테스트 입력 간에 유효 스무딩 매개변수 p0_s_hat가 어떻게 다른가?
RQ3무작위화와 앙상블 크기가 전통적 바이어스-분산 설명을 넘어서 스무딩 유발 분산을 줄이고 일반화 성능을 향상시키는가?
RQ4Wyner 등의 스파이크드-스무드 설명과 Mentch and Zhou의 무작위화를 규정으로 보는 관점이 조화될 수 있는가?
RQ5단일 나무보다 포레스트가 분산 감소를 넘어 개선되는 뚜렷한 메커니즘은 무엇인가?

주요 결과

보간 포레스트 앙상블은 학습 데이터보다 보지 못한 테스트 데이터에서 더 적은 유효 매개변수를 사용하여 스파이크드-스무드 동작을 보인다.
앙상블 무작위성과 크기를 늘리면 보지 못한 입력에 대해 더 큰 스무딩(테스트 데이터에서 p0_s_hat 감소)이 발생한다.
스무딩 관점은 테스트 입력에서 포레스트가 개별 트리보다 더 부드러울 수 있음을 포착하며 특히 입력이 불충분할 때 그렇다.
Mentch와 Zhou의 자유도 척도는 포레스트의 이점을 설명하기에 충분하지 않으며, p0_s_hat 척도가 보다 완전한 설명을 제공한다.
포레스트는 트리보다 세 가지 메커니즘으로 향상된다: 스무딩이 노이즈 결과로 인한 분산을 줄이고, 고정된 데이터에서 학습된 함수 품질의 가변성을 줄이며, 가설 공간을 풍부하게 만들어 편향 가능성을 줄인다.
경험적 결과는 샘플 내부 예측이 결과 노이즈 분산 감소의 이점을 얻고, 보지 못한 입력에 대한 일반화는 입력 간에 다른 스무딩 동작으로부터 이점을 얻음을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.