QUICK REVIEW

[論文レビュー] Post-Processing of High-Dimensional Data

Alexander Litvinenko, Mike Espig|arXiv (Cornell University)|Jan 1, 2019

Tensor decomposition and applications参考文献 56被引用数 1

ひとこと要約

本稿では、内積を備えた抽象的で結合的・可換な代数構造における代数的演算を活用することで、高次元かつ圧縮済みのデータ（テンソルとして表現）を効率的に後処理するフレームワークを提案する。この手法により、元のデータを完全に復元せずに、最大値・最小値、等高線集合、頻度、確率、モーメントといった重要な統計的・極値的性質を、低ランクまたは圧縮されたテンソル表現上で固定点反復によって計算可能となる。

ABSTRACT

Scientific computations or measurements may result in huge volumes of data. Often these can be thought of representing a real-valued function on a high-dimensional domain, and can be conceptually arranged in the format of a tensor of high degree in some truncated or lossy compressed format. We look at some common post-processing tasks which are not obvious in the compressed format, as such huge data sets can not be stored in their entirety, and the value of an element is not readily accessible through simple look-up. The tasks we consider are finding the location of maximum or minimum, or minimum and maximum of a function of the data, or finding the indices of all elements in some interval --- i.e. level sets, the number of elements with a value in such a level set, the probability of an element being in a particular level set, and the mean and variance of the total collection. The algorithms to be described are fixed point iterations of particular functions of the tensor, which will then exhibit the desired result. For this, the data is considered as an element of a high degree tensor space, although in an abstract sense, the algorithms are independent of the representation of the data as a tensor. All that we require is that the data can be considered as an element of an associative, commutative algebra with an inner product. Such an algebra is isomorphic to a commutative sub-algebra of the usual matrix algebra, allowing the use of matrix algorithms to accomplish the mentioned tasks. We allow the actual computational representation to be a lossy compression, and we allow the algebra operations to be performed in an approximate fashion, so as to maintain a high compression level. One such example which we address explicitly is the representation of data as a tensor with compression in the form of a low-rank representation.

研究の動機と目的

圧縮または切り捨てられたテンソル形式で保存された大規模かつ高次元のデータに対して後処理を実行する課題に対処すること。
ロスありまたは低ランクの圧縮により個々のデータ値が直接アクセス不能であるという制限を克服すること。
完全な復元を伴わずに、極値、等高線集合、統計モーメント（平均、分散）の計算を可能にすること。
テンソル表現に依存しない汎用的な計算フレームワークを構築すること。代数的構造に依存するのみである。
重要な後処理タスクの精度を保ちながらも、高い圧縮率を維持すること。

提案手法

圧縮データを、高次元テンソル空間に属する要素としてモデル化し、結合的・可換な代数構造と内積を備えた抽象的構造として扱う。
後処理タスクを、データの代数的構造上での特定関数の固定点反復として定式化する。
代数構造と行列代数の可換部分代数との同型性を活用し、既存の行列アルゴリズムを適用する。
処理中に高い圧縮比を維持するために、代数的演算の近似を許容する。
損失あり圧縮に適合する主要な例として、低ランクテンソル表現を明示的に扱う。
反復的収束を用いて、最大値／最小値、等高線集合のインデックス、統計モーメントといったグローバルな性質を計算する。

実験結果

リサーチクエスチョン

RQ1完全な復元を伴わずに、圧縮された高次元テンソルにおいて最大値・最小値をどのように計算できるか？
RQ2損失あり圧縮データにおいて、等高線集合およびその濃度を効率的に計算するための代数的枠組みは何か？
RQ3平均や分散といった統計モーメントは、圧縮された状態で代数的演算を近似的に行う場合でも信頼性を持って計算可能か？
RQ4固定点反復スキームは、テンソルから導出された抽象的代数的構造において、後処理タスクをどの程度効果的に解けるか？
RQ5本フレームワークは、高圧縮を維持しつつ、主要なデータ分析タスクにおける計算結果の正確性をどの程度保てるか？

主な発見

本フレームワークにより、元の代数的構造上で固定点反復を用いることで、圧縮されたテンソルデータにおける最大値・最小値の計算が可能となる。
データの復元を伴わずに、圧縮ドメインにおける反復的代数的演算により、等高線集合およびそのカウントが計算可能である。
特定の区間内に要素が含まれる確率は、代数関数の反復的評価により推定可能である。
平均や分散といった統計モーメントは、与えられた代数的モデル下で収束する固定点反復により、正確な値に到達可能である。
代数的演算が近似されても本手法は有効であり、高い圧縮率を維持しながら計算の実行可能性を保証する。
本手法は一般性に富み、テンソル表現に依存せず、結合的・可換な代数構造と内積の存在にのみ依存する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。