QUICK REVIEW

[论文解读] InfTucker: t-Process based Infinite Tensor Decomposition

Zenglin Xu, Feng Yan|arXiv (Cornell University)|Aug 31, 2011

Tensor decomposition and applications被引用 2

一句话总结

InfTucker 提出了一种基于 t-过程和无限特征空间的非参数贝叶斯张量分解框架，用于建模复杂交互作用、混合数据类型（连续型与二值型）以及异常值。通过高效的变分推断，该方法将时间和空间复杂度降低了几个数量级，在化学计量学和社交网络数据集上显著提升了预测精度，优于当前最先进方法。

ABSTRACT

Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches---such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)---amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g. missing data and binary data), and (iii) noisy observations and outliers. To address these issues, we propose tensor-variate latent nonparametric Bayesian models, coupled with efficient inference methods, for multiway data analysis. We name these models InfTucker. Using these InfTucker, we conduct Tucker decomposition in an infinite feature space. Unlike classical tensor decomposition models, our new approaches handle both continuous and binary data in a probabilistic framework. Unlike previous Bayesian models on matrices and tensors, our models are based on latent Gaussian or $t$ processes with nonlinear covariance functions. To efficiently learn the InfTucker from data, we develop a variational inference technique on tensors. Compared with classical implementation, the new technique reduces both time and space complexities by several orders of magnitude. Our experimental results on chemometrics and social network datasets demonstrate that our new models achieved significantly higher prediction accuracy than the most state-of-art tensor decomposition

研究动机与目标

为解决经典张量分解模型在处理数据实体之间复杂交互作用方面的局限性。
在统一的概率框架内建模多种数据类型，包括缺失数据、二值数据和噪声观测。
开发一种非参数贝叶斯方法，实现在 Tucker 分解中的无限特征学习。
通过可扩展的变分推断技术降低张量分解的计算复杂度。
在具有异质性和噪声的现实世界数据集上提升预测精度。

提出的方法

基于具有非线性协方差函数的 t-过程，提出基于张量变量的潜在非参数贝叶斯模型。
提出 InfTucker 作为利用潜在高斯过程或 t-过程实现无限 Tucker 分解的框架。
采用专为张量设计的变分推断技术，实现在大规模场景下的高效学习。
使用非线性协方差函数以捕捉多维数据中的复杂非线性交互作用。
在单一概率张量分解框架内支持连续型与二值型数据的联合建模。
与经典实现相比，将时间和空间复杂度降低了几个数量级。

实验结果

研究问题

RQ1如何将张量分解模型扩展以处理多维数据中数据实体之间的复杂非线性交互作用？
RQ2非参数贝叶斯方法是否能有效在统一的张量分解框架内建模混合数据类型（如连续型、二值型和缺失数据）？
RQ3t-过程和无限特征空间如何提升张量分解对噪声和异常值的鲁棒性？
RQ4变分推断对张量分解中的可扩展性和计算效率有何影响？
RQ5所提出的 InfTucker 框架在现实世界数据集上的预测精度相较于最先进方法有多大提升？

主要发现

在化学计量学和社交网络数据集上，InfTucker 的预测精度显著优于当前最先进张量分解方法。
所提出的变分推断技术相比经典实现，将时间和空间复杂度均降低了几个数量级。
该模型在单一概率框架内有效处理了混合数据类型，包括连续型和二值型数据。
使用 t-过程使得张量分解能够稳健地建模噪声观测和异常值。
无限特征空间允许灵活地、基于数据发现潜在结构，而无需预先指定组件数量。
t-过程模型中的非线性协方差函数能够捕捉多维数组中数据实体之间的复杂非线性交互作用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。