QUICK REVIEW

[論文レビュー] Being Robust (in High Dimensions) Can Be Practical

Ilias Diakonikolas, Gautam Kamath|arXiv (Cornell University)|Mar 2, 2017

Statistical Methods and Inference参考文献 26被引用数 72

ひとこと要約

本論文は、Filtering アプローチを用いて高次元の平均と共分散を頑健に推定するための、サンプル最適性に近い実用的アルゴリズムを提示し、実証的性能が高い。

ABSTRACT

Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.

研究の動機と目的

高次元における頑健統計を動機づけ、実用性を妨げる計算制約に対処する。
頑健な平均と共分散推定のためのほぼ最適なサンプル複雑性の上界を提供する。
一定の割合の敵対的な改ざんを許容する、実用的なフィルタリングベースのアルゴリズムを開発する。
サブガウシアン性および有界モーメント分布へ拡張可能な頑健性保証を示す。

提案手法

経験的共分散のスペクトル特性に基づいて外れ値を反復的に除去するフィルタリングフレームワークを用いる。
上位固有ベクトルに沿って一変量の裾部検定を適用し、改ざんされた点を特定して削除する。
悪い点と良い点の削除のバランスを取るため、適応型裾境界付けによる閾値設定を最適化する。
高次モーメント（例：四次モーメント）を監視することで、頑健な共分散推定へフィルタを拡張する。
経験的平均ではなく、頑健な一変量平均（例：中央値）で中心化することで実用的な性能を向上させる。

実験結果

リサーチクエスチョン

RQ1フィルタリングベースの頑健推定量は高次元でほぼ最適なサンプル複雑性を達成できるか？
RQ2平均と共分散推定において、頑健性を損なわずに耐えられる敵対的改ざんの割合はどれくらい大きいか？
RQ3提案したアルゴリズムは、有限モーメントやサブガウス性といった弱い分布仮定の下でも有効であり続けるか？
RQ4高次元での経験的性能を改善する実用的な調整戦略（例：適応裾）とは何か？

主な発見

平均推定アルゴリズムは nearly optimal サンプル複雑性を Ϛormed as Ϛor known covariance and Ϛor unknown covariance under sub-Gaussian assumptions.
有限二次モーメントの下では、平均推定量はより少ないサンプルでほぼ最適な誤差境界を達成する。
共分散推定量は、 affine-invariant Mahalanobis 距離における誤差境界で敵対的改ざんを許容する。
適応裾境界付けと経験的調整は、実用的な性能と次元拡張性を大幅に改善する。
実証的な結果は、合成データと実データの双方で最先端の性能を示し、頑健性は非ガウス分布設定にも拡張される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。