Skip to main content
QUICK REVIEW

[Paper Review] DFacTo: Distributed Factorization of Tensors

Joon Hee Choi, S. V. N. Vishwanathan|arXiv (Cornell University)|Jun 17, 2014
Tensor decomposition and applications13 references80 citations
TL;DR

DFacTo is a distributed tensor factorization algorithm that accelerates Alternating Least Squares (ALS) and Gradient Descent (GD) by efficiently computing the Khatri-Rao product via two sparse matrix-vector multiplications, avoiding intermediate data explosion. It achieves 4–10x speedup over existing methods, performing one ALS iteration in 480 seconds and one GD iteration in 1,143 seconds on a 6.5M×2.5M×1.5M tensor with 1.2B non-zeros across 4 machines.

ABSTRACT

We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. By exploiting properties of the Khatri-Rao product, we show how to efficiently address a computationally challenging sub-step of both algorithms. Our algorithm, DFacTo, only requires two sparse matrix-vector products and is easy to parallelize. DFacTo is not only scalable but also on average 4 to 10 times faster than competing algorithms on a variety of datasets. For instance, DFacTo only takes 480 seconds on 4 machines to perform one iteration of the ALS algorithm and 1,143 seconds to perform one iteration of the GD algorithm on a 6.5 million x 2.5 million x 1.5 million dimensional tensor with 1.2 billion non-zero entries.

Motivation & Objective

  • To address the intermediate data explosion problem in tensor factorization algorithms like ALS and GD.
  • To enable scalable, distributed computation of tensor factorization on massive, sparse tensors.
  • To design a method that avoids expensive intermediate representations while maintaining high performance.
  • To provide a practical, parallelizable solution compatible with standard sparse linear algebra libraries.
  • To outperform existing tools like the Tensor Toolbox and GigaTensor in speed and scalability on real-world datasets.

Proposed method

  • DFacTo reformulates the key computation step in ALS and GD—multiplication involving the Khatri-Rao product—using two sparse matrix-vector products.
  • It leverages properties of the Khatri-Rao product to avoid explicit formation of large intermediate matrices.
  • The algorithm is designed to be naturally distributed across multiple machines, enabling horizontal scaling.
  • It uses standard sparse linear algebra operations, ensuring compatibility with existing high-performance libraries.
  • The method supports both ALS and GD optimization strategies with minimal algorithmic changes.
  • It maintains numerical stability by avoiding dense tensor representations and working directly with sparse data structures.

Experimental results

Research questions

  • RQ1Can the Khatri-Rao product computation in tensor factorization be accelerated without forming large intermediate matrices?
  • RQ2How can tensor factorization algorithms be made both scalable and efficient for massive, sparse tensors?
  • RQ3What is the performance gain of using sparse matrix-vector operations over traditional dense or intermediate matrix approaches?
  • RQ4Can a distributed implementation of tensor factorization achieve significant speedups on real-world datasets with billions of non-zero entries?
  • RQ5How does DFacTo compare in performance and memory usage to existing tools like the Tensor Toolbox and GigaTensor?

Key findings

  • DFacTo reduces one iteration of ALS on a 6.5M×2.5M×1.5M tensor with 1.2B non-zeros to 480 seconds across 4 machines.
  • One iteration of GD on the same tensor takes 1,143 seconds using DFacTo, demonstrating high scalability.
  • DFacTo is on average 5x faster than GigaTensor and 10x faster than the Tensor Toolbox for ALS on various datasets.
  • For GD, DFacTo achieves a 4x speedup over CP-OPT from the Tensor Toolbox.
  • The algorithm uses 3x more memory than the Tensor Toolbox due to storing three flattened matrices, but this is offset by superior computational efficiency.
  • Empirical results show that the joint matrix completion and tensor factorization model using DFacTo achieves lower mean squared error than matrix completion alone across all datasets.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.