QUICK REVIEW

[論文レビュー] Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data

С. Б. Попов, Stanislav Morozov|arXiv (Cornell University)|Sep 13, 2019

Adversarial Robustness in Machine Learning参考文献 29被引用数 62

ひとこと要約

NODEは、oblivious decision treesを多層のアンサンブルへ拡張する微分可能な深層アーキテクチャで、表形式データにおいて最先端の成果を達成し、しばしば調整済みの勾配ブースティング法を上回る。

ABSTRACT

Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.

研究の動機と目的

GBDTと比較して従来の DNN が性能を発揮しにくい異種の表データに対して、ディープラーニングの必要性を喚起する。
END-TO-ENDで訓練される、oblivious decision trees の微分可能なアンサンブルである NODE を紹介する。
複数の表データセットに対して、NODE が主要な GBDT パッケージを上回ることを示す。
END-TO-END の訓練と推論の検討を含む、効率性と実用性を示す。

提案手法

共有深度のスプリット特徴と entmax ベースのソフト決定を持つ、微分可能な ODTs を定義する。
F、閾値 b、応答テンソル R を共有特徴選択マトリクスを用いて、深さ d の m 個の ODT の微分可能なアンサンブルとして NODE layer を用いる。
Heaviside 分割を微分可能な entmax ベースの決定に緩和し、外積を用いて葉のルーティング用の選択テンソル C を形成する。
複数の NODE layers を DenseNet-like アーキテクチャに積み重ねて深い NODE モデルを形成し、出力は全ての木と層で平均化する。
quantile transformでデータを前処理し、データ感知型にパラメータを初期化し、mini-batch SGD (Quasi-Hyperbolic Adam) とチェックポイント平均化で end-to-end に訓練する。
高速推論のために、疎な entmax selectors を事前計算して推論を最適化する。

実験結果

リサーチクエスチョン

RQ1差分可能でエンドツーエンドで訓練可能な oblivious decision ensembles は、表データ上のチューニング済み GBDT メソッドを上回ることができるか。
RQ2NODE レイヤをスタックすることで、表形式問題の表現力を向上させるのに推論効率を損なわずに可能か。
RQ3entmax は、分岐を学習する際にどのように sparse で有効な特徴分割を実現するのに役立つか。
RQ4NODE ベースのモデルは、CatBoost、XGBoost、そしてニューラルベースのベースラインと多様な表データセットでどう比較されるか。

主な発見

Dataset	CatBoost	XGBoost	NODE	mGBDT	DeepForest
Epsilon	0.1119±2e-4	0.1144	0.1034±3e-4	OOM	0.1179
YearPrediction	80.68±0.04	81.11	77.43±0.09	80.67	—
Higgs	0.2434±2e-4	0.2600	0.2412±5e-4	OOM	0.2391
Microsoft	0.5587±2e-4	0.5637	0.5584±3e-4	OOM	—
Yahoo	0.5781±3e-4	0.5756	0.5666±5e-4	OOM	—
Click	0.3438±1e-3	0.3461	0.3309±3e-4	OOM	0.3333

NODE は、デフォルトのハイパーパラメータ下で複数のデータセットにおいて CatBoost および XGBoost を一貫して上回る。
ハイパーパラメータを調整した場合でも、ほとんどのタスクで競合を上回り、Yahoo と Microsoft では時に調整済み XGBoost が優位になる。
アブレーションにより、depth を超えて entmax (α=1.5) が softmax、Gumbel-Softmax、sparsemax より優れた結果を与えることを示す。
特徴重要度の分析は、初期の層がはるかに多くの入力特徴量を提供し、深い層が最終予測に寄与する割合が大きくなることを示す。
NODE は競争力のある学習と推論時間を達成し、GPU/CPU 環境で最適化された GBDT ライブラリと同等の推論性能を示す。
NODE ベースの手法は表データを堅牢に扱い、GBDT に対するスケーラブルなエンドツーエンド微分可能な代替手段を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。