QUICK REVIEW

[論文レビュー] Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

Yihe Dong, Samuel B. Hopkins|arXiv (Cornell University)|Jun 26, 2019

Fault Detection and Control Systems参考文献 25被引用数 33

ひとこと要約

QUE-scoring via quantum entropy regularization を導入し、理論的保証と経験的検証を伴う、ほぼ線形時間の頑健な平均推定と高次元の異常検知を実現。

ABSTRACT

We study two problems in high-dimensional robust statistics: \emph{robust mean estimation} and \emph{outlier detection}. In robust mean estimation the goal is to estimate the mean $μ$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an $\varepsilon$-fraction of which have been corrupted by a malicious adversary. In outlier detection the goal is to assign an \emph{outlier score} to each element of a data set such that elements more likely to be outliers are assigned higher scores. Our algorithms for both problems are based on a new outlier scoring method we call QUE-scoring based on \emph{quantum entropy regularization}. For robust mean estimation, this yields the first algorithm with optimal error rates and nearly-linear running time $\widetilde{O}(nd)$ in all parameters, improving on the previous fastest running time $\widetilde{O}(\min(nd/\varepsilon^6, nd^2))$. For outlier detection, we evaluate the performance of QUE-scoring via extensive experiments on synthetic and real data, and demonstrate that it often performs better than previously proposed algorithms. Code for these experiments is available at https://github.com/twistedcubic/que-outlier-detection .

研究の動機と目的

敵対的な汚染下で高次元における頑健な平均推定と異常検知を動機づける。
量子エントロピー正則化に基づく QUE-スコアリングを開発し、異常値の影響力のある方向を特定する。
有界共分散およびサブガウス的なレジーム下で、頑健な平均推定のほぼ線形時間のアルゴリズムを達成する。
QUEスコアとスペクトル情報を活用した経験的に検証された異常検知手法を提供する。

提案手法

QUEスコアを、密度行矩 U = exp(alpha * \u0003b1overSigma) / tr exp(alpha * \u0003b1overSigma) を用いて定義する。ここで overSigma は経験的共分散で、alpha は調整パラメータである。
QUEを行列乗算重み法(framework)に組み込み、疑われる異常値を反復的に低重み付けする。
加重共分散のスペクトルノルムを O(log d) ラウンドで低減する DecreaseSpectralNorm サブルーチンを使用する。
2つのアルゴリズムの variante を提供する：bounded covariance 用の QUEScoreFilter と sub-Gaussian 分布用の s.g.-QUEScoreFilter。
QUEベースのスコアリングが、複数の大きな固有方向を考慮することで、単純なスペクトルスコアより改善することを示す。
低ランクスケッチを用いて明示的な行列表現を避けることで、ほぼ線形時間の実行を実証する。

実験結果

リサーチクエスチョン

RQ1QUE-scoring は高次元データにおける異なる方向を持つ異常値をより効果的に検出できるか。
RQ2有界共分散およびサブガウシアン仮定のもとで、ほぼ線形時間での近似最適誤差を達成できるか。
RQ3実務的には、従来のベースライン（ノルムベース、トップ固有ベクトルスペクトル、局所法）と比較して QUE スコアはどうか。
RQ4エントロピー正則化パラメータ alpha が性能に与える影響は何か。

主な発見

QUE スコアリングは、有界共分散の下で誤差境界がほぼ線形時間の頑健な平均推定を実現する。誤差は 1/23sqrt(epsilon) + ~sqrt(d/n) 。
サブガウシアンの場合、誤差境界は 1O(epsilon sqrt(log(1/epsilon)) + sqrt(d/n)) で高確率の下で達成される。
QUEベースの異常検知は、実験的に高次元のデータセットでPCAベース、距離ベース、他のベースラインを上回ることが多い。
合成データ、語彙埋め込み、CIFAR-10 データの実験で、QUE はさまざまな設定で ROCAUC を改善することを示した。
実用的実装では、Johnson-Lindenstrauss スケッチと行列指数近似を用いて、ほぼ線形時間で近似 QUE スコアを計算する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。