QUICK REVIEW

[論文レビュー] NeuMiss networks: differentiable programming for supervised learning with missing values

Marine Le Morvan, Julie Josse|arXiv (Cornell University)|Jul 3, 2020

Machine Learning and ELM参考文献 32被引用数 14

ひとこと要約

NeuMissネットワークは、欠損状態インジケータを用いた学習可能な乗法的非線形性を用いて、欠損データのパターンを明示的にモデル化する微分可能ニューラルネットワークアーキテクチャを導入する。Neumann級数による最適ベイズ予測子の近似により、MNARメカニズム下でも特に優れた性能を達成し、欠損パターンの数に依存しない計算量および標本複雑性を維持する。

ABSTRACT

The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing patterns, which can be exponential in the number of dimensions. In this work, we derive the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms including Missing at Random (MAR) and self-masking (Missing Not At Random). Based on a Neumann-series approximation of the optimal predictor, we propose a new principled architecture, named NeuMiss networks. Their originality and strength come from the use of a new type of non-linearity: the multiplication by the missingness indicator. We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns. As a result they scale well to problems with many features, and remain statistically efficient for medium-sized samples. Moreover, we show that, contrary to procedures using EM or imputation, they are robust to the missing data mechanism, including difficult MNAR settings such as self-masking.

研究の動機と目的

MCARやMARに加え、MNARのような複雑な欠損データメカニズム下でも、欠損値を伴う教師あり学習の課題に対処すること。
すべての2^d個の欠損パターンを明示的にモデル化する従来手法が直面する指数的計算量および標本複雑性を克服すること。
観測済みデータおよび欠損パターンに基づいて、暗黙的に値を補完する関数を学習する理論的根拠を持つニューラルネットワークアーキテクチャの開発。
標準的な補完法やEM法が失敗する、未知または複雑な欠損データメカニズム（自己マスクMNARを含む）に対して耐性を持つこと。
高次元データにスケーラブルであり、低標本および低計算複雑性で高い予測精度を達成すること。

提案手法

MARおよびMNARメカニズム下での線形回帰におけるベイズ予測子の解析的表現を導出する。自己マスクを含む。
Neumann級数展開を用いて最適予測子を近似し、微分可能最適化を可能にする。
画期的な非線形性を導入：隠れ表現と欠損状態インジケータ（⊙M）との要素ごとの乗算により、パターンに依存した学習を可能にする。
各層が⊙M非線形性を適用する深層アーキテクチャを設計し、データ依存の複雑な補完を学習可能にする。
標準損失（例：MSE）を用いた確率的勾配降下法によりネットワークを訓練し、エンドツーエンドの微分可能性と一貫した予測子への収束を保証する。
より深いバージョンでは残差接続を採用し、学習の安定化と一般化性能の向上を図る。

実験結果

リサーチクエスチョン

RQ1MCARおよびMARメカニズム下で、線形回帰の最適予測子の解析的表現は何か？
RQ2すべての2^d個の欠損パターンを明示的にモデル化せずに、最適補完関数を暗黙的に学習できるニューラルネットワークアーキテクチャを設計できるか？
RQ3欠損状態インジケータ（⊙M）との乗算を用いた非線形性は、欠損データメカニズムに対する一般化性能および耐性をどのように向上させるか？
RQ4特にMNAR設定下で、NeuMissアーキテクチャはEM法やMICEといった標準的手法を上回る予測性能を達成できるか？
RQ52^d個のモデルを必要とする手法と比較して、NeuMissネットワークの理論的および実験的標本複雑性は何か？

主な発見

NeuMissネットワークは、高次元データ（d = 10, n = 10^5）においてMCARおよびMAR下で、ベイズレートから1%以内のR²スコアを達成し、ほぼ最適性能を実現する。
自己マスクMNAR設定下では、NeuMissはEM法やMICEを著しく上回り、標本サイズが増加するにつれて性能差が拡大する。
アーキテクチャは、欠損パターン数2^dに依存しない低計算複雑性O(d²)および低標本複雑性O(d²)を維持する。
NeuMissネットワークは、EM法や補完ベース手法がモデル誤指定により失敗する自己マスクMNARメカニズムに対しても耐性を持つ。
NeuMissネットワークの容量を増加させることで予測精度が向上するが、従来のMLPとは異なり、より深いネットワークが性能向上をもたらさない。
浅いバージョンのNeuMissは、マスクされた入力を併合した標準MLPと数学的に同等であり、一般的に採用されるマスク入力の併合手法に理論的根拠を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。