Skip to main content
QUICK REVIEW

[论文解读] Post-Processing of High-Dimensional Data

Alexander Litvinenko, Mike Espig|arXiv (Cornell University)|Jan 1, 2019
Tensor decomposition and applications参考文献 56被引用 1
一句话总结

该论文提出了一种高效后处理高维压缩数据(以张量形式表示)的框架,通过在具有内积的抽象结合律、交换律代数中运用代数运算,实现无需解压完整数据即可计算关键统计与极值属性(如最大/最小值、等高线集、计数、概率和矩)。该方法基于低秩或压缩张量表示的不动点迭代完成。

ABSTRACT

Scientific computations or measurements may result in huge volumes of data. Often these can be thought of representing a real-valued function on a high-dimensional domain, and can be conceptually arranged in the format of a tensor of high degree in some truncated or lossy compressed format. We look at some common post-processing tasks which are not obvious in the compressed format, as such huge data sets can not be stored in their entirety, and the value of an element is not readily accessible through simple look-up. The tasks we consider are finding the location of maximum or minimum, or minimum and maximum of a function of the data, or finding the indices of all elements in some interval --- i.e. level sets, the number of elements with a value in such a level set, the probability of an element being in a particular level set, and the mean and variance of the total collection. The algorithms to be described are fixed point iterations of particular functions of the tensor, which will then exhibit the desired result. For this, the data is considered as an element of a high degree tensor space, although in an abstract sense, the algorithms are independent of the representation of the data as a tensor. All that we require is that the data can be considered as an element of an associative, commutative algebra with an inner product. Such an algebra is isomorphic to a commutative sub-algebra of the usual matrix algebra, allowing the use of matrix algorithms to accomplish the mentioned tasks. We allow the actual computational representation to be a lossy compression, and we allow the algebra operations to be performed in an approximate fashion, so as to maintain a high compression level. One such example which we address explicitly is the representation of data as a tensor with compression in the form of a low-rank representation.

研究动机与目标

  • 解决在以压缩或截断张量格式存储的大规模高维数据上执行后处理任务的挑战。
  • 克服由于有损或低秩压缩导致无法直接访问单个数据值的限制。
  • 在无需完全解压的情况下,实现极值、等高线集和统计矩(均值、方差)的计算。
  • 开发一种与张量表示无关的通用计算框架,仅依赖于代数结构。
  • 通过允许近似代数运算,在保持高压缩率的同时确保关键后处理任务的准确性。

提出的方法

  • 将压缩数据建模为高阶张量空间中的元素,抽象为具有内积的结合律、交换律代数。
  • 将后处理任务表述为在数据代数结构上定义的特定函数的不动点迭代。
  • 利用该代数与矩阵代数的交换子代数之间的同构关系,应用已有的矩阵算法。
  • 允许代数运算的近似计算,以在处理过程中维持高数据压缩率。
  • 明确处理低秩张量表示作为与该框架兼容的有损压缩的主要示例。
  • 通过迭代收敛计算全局属性,如最大/最小值、等高线集索引和统计矩。

实验结果

研究问题

  • RQ1如何在不完全解压的情况下,从压缩的高维张量中计算极值(最大值和最小值)?
  • RQ2何种代数框架能够实现对有损压缩数据中等高线集及其基数的高效计算?
  • RQ3在压缩环境下,通过近似代数运算能否可靠计算如均值和方差等统计矩?
  • RQ4不动点迭代方法在从张量导出的抽象代数结构中,能在多大程度上解决后处理任务?
  • RQ5该框架如何在保持高数据压缩率的同时,确保关键数据分析任务计算结果的准确性?

主要发现

  • 该框架通过在底层代数结构上的不动点迭代,实现了对压缩张量数据中最大值和最小值的计算。
  • 无需解压数据,即可通过压缩域中的迭代代数运算计算等高线集及其计数。
  • 可通过迭代评估代数函数,估计元素落入指定区间的概率。
  • 在给定代数模型下,通过收敛于正确值的不动点迭代,可获得均值、方差等统计矩。
  • 即使代数运算被近似处理,该方法仍保持有效性,在维持高数据压缩率的同时确保计算的可行性。
  • 该方法具有通用性,与张量表示无关,仅依赖于结合律、交换律代数与内积的存在。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。