QUICK REVIEW

[論文レビュー] Robust Bayesian Tensor Factorization for Incomplete Multiway Data.

Qibin Zhao, Guoxu Zhou|arXiv (Cornell University)|Oct 9, 2014

Tensor decomposition and applications参考文献 40被引用数 8

ひとこと要約

本稿では、階層的事前分布と変分推論を用いて、不完全な多次元データにおいて低ランク成分とスパース成分を同時にモデル化する、ロバストなベイジアンテンソル因子分解手法を提案する。ハイパーパrameterのチューニングを必要とせず、自動的にランクを決定し、外れ値を同定可能であり、合成データおよび実世界のデータセットにおいて、テンソル補完とロバスト性の両面で優れた性能を達成する。

ABSTRACT

Abstract—We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-t distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient closed-form variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world datasets demonstrate the superiorities of our method from several perspectives. Index Terms—Tensor factorization, tensor completion, robust factorization, rank determination, variational Bayesian inference, video background modeling F 1

研究の動機と目的

欠損データと外れ値が存在する状況におけるロバストなテンソル因子分解の課題に取り組む。
テンソルデータにおいて、グローバルな低ランク構造とローカルなスパース外れ値を明示的に分離する。
手動のチューニングを必要とせず、CPランクとスパース性誘導ハイパーパrameterの自動モデル選択を可能にする。
過学習を防止し、データサイズに線形にスケーリングする完全なベイジアンフレームワークを提供する。
最大モデル尤度を用いて、低ランク近似とスパース表現のトレードオフを最適化する。

提案手法

潜在要因間の多次元相互作用を用いて低CPランクテンソルをモデル化し、階層的事前分布により列スパース性を強制する。
個々の要素ごとに個別のハイパーパrameterを持つ階層的スチューデントt分布を用いてスパーステンソルを表現し、外れ値のロバストなモデリングを実現する。
完全な生成モデルに基づき、閉形式の変分推論を採用することで、効率的かつスケーラブルなベイジアン推論を実現する。
周辺尤度の最大化を通じて自動的にランクを発見し、手動によるパラメータチューニングを回避する。
完全なベイジアン処理により学習を正則化し、不完全なデータ上での一般化性能を向上させる。
モデル尤度の最大化により、低ランク成分とスパース成分のバランスを最適化する。

実験結果

リサーチクエスチョン

RQ1ベイジアンテンソル因子分解モデルは、事前の知識がなくとも真のCPランクを自動的に特定できるか？
RQ2不完全な多次元データにおいて、グローバルな低ランク構造とローカルな外れ値をどれほど効果的に分離できるか？
RQ3欠損データと汚染が存在する状況下で、本手法は最先端の手法をどれほど上回るか？
RQ4自動スパース性誘導により、多様な外れ値タイプに適応できるか？
RQ5変分推論フレームワークは、データサイズの増大に対しても効率的にスケーリング可能であり、ロバスト性を維持できるか？

主な発見

合成実験において、ランクパラメータの手動チューニングを一切必要とせず、真のCPランクを自動的に発見した。
欠損付きの合成データおよび実世界のデータセットにおいて、最先端の手法と比較して優れたテンソル補完精度を達成した。
外れ値を効果的に同定・分離でき、著しい汚染が存在する状況でも予測性能が向上した。
変分推論フレームワークはデータサイズに線形にスケーリングされ、大規模テンソル上での効率的学習を可能にした。
モデル尤度を用いたトレードオフの最適化により、より良い一般化性能が得られた。
階層的スチューデントt事前分布により、固定ペナルティやガウスベースの手法を上回る適応的外れ値検出が可能となった。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。