QUICK REVIEW

[Paper Review] Low-rank matrix factorization with attributes

Jacob Abernethy, Francis Bach|ArXiv.org|Nov 24, 2006

Recommender Systems and Techniques16 references99 citations

TL;DR

This paper proposes a generalized low-rank matrix factorization framework that integrates user and item attributes with collaborative filtering via tensor product kernels, enabling improved preference prediction by jointly regularizing over matrix rank and kernel-induced smoothness. Experiments on movie ratings show significant performance gains over standard matrix completion and attribute-only methods, especially when both data types are combined with optimal hyperparameter tuning.

ABSTRACT

We develop a new collaborative filtering (CF) method that combines both previously known users' preferences, i.e. standard CF, as well as product/user attributes, i.e. classical function approximation, to predict a given user's interest in a particular product. Our method is a generalized low rank matrix completion problem, where we learn a function whose inputs are pairs of vectors -- the standard low rank matrix completion problem being a special case where the inputs to the function are the row and column indices of the matrix. We solve this generalized matrix completion problem using tensor product kernels for which we also formally generalize standard kernel properties. Benchmark experiments on movie ratings show the advantages of our generalized matrix completion method over the standard matrix completion one with no information about movies or people, as well as over standard multi-task or single task learning methods.

Motivation & Objective

To address the limitations of standard collaborative filtering, which ignores user and item attributes, by integrating them into the matrix completion process.
To unify low-rank matrix factorization, multi-task learning, and function approximation within a single kernel-based framework.
To develop a generalized matrix completion method that interpolates between pure collaborative filtering and attribute-based learning.
To empirically validate that combining both preference data and attribute information leads to better predictive performance than either approach alone.
To demonstrate the importance of balanced regularization through rank and kernel norm constraints in the proposed framework.

Proposed method

The method models user-item preferences as a function f(x,y) where x and y are user and item attribute vectors, using a tensor product kernel k⊗((x₁,y₁),(x₂,y₂)) = k(x₁,x₂)g(y₁,y₂) to define similarity in the joint space.
It formulates the learning problem as minimizing a regularized empirical risk over a reproducing kernel Hilbert space (RKHS), with regularization combining trace norm (for low-rank structure) and kernel norm (for smoothness in attribute space).
The framework generalizes standard low-rank matrix completion (when only indices are used) and classical function approximation (when only attributes are used), with both as special cases.
A hybrid kernel is introduced, parameterized by η and ζ, to balance the contribution of identity (Dirac) kernel on indices and attribute-based kernels on user and item features.
The optimization is solved via kernel ridge regression with a combined regularization term that includes both the trace norm of the matrix and the RKHS norm of the function.
Hyperparameters (rank d, regularization λ, and η, ζ) are selected via cross-validation on a subset of the MovieLens dataset.

Experimental results

Research questions

RQ1Can combining user and item attributes with collaborative filtering improve prediction accuracy beyond standard matrix completion?
RQ2How does the interaction between low-rank structure and attribute-based smoothness affect model generalization?
RQ3What is the optimal trade-off between using user/item IDs versus their attributes in the prediction function?
RQ4Does the proposed method generalize better to new users or items than pure collaborative filtering?
RQ5How do different regularization strategies (rank vs. kernel norm) interact and influence performance?

Key findings

The proposed method achieved a test mean squared error (MSE) of 1.0351 with optimal hyperparameters, significantly outperforming standard matrix completion (MSE ≈ 1.13–1.54) and attribute-only baselines.
Performance degraded sharply at the corners of the η,ζ parameter space (e.g., (0,0) or (1,1)), indicating that relying solely on IDs or attributes harms generalization.
The best performance was consistently observed in the interior of the η,ζ range, such as (0.15,0.15) or (0.5,0.5), confirming the benefit of combining both information sources.
Regularization via rank d was critical: increasing d from 50 to 200 improved MSE from 1.5391 to 1.0818 when η=ζ=0, showing the importance of low-rank structure.
The choice of regularization parameter λ had a strong impact on performance: for d=130, η=ζ=0.15, MSE increased from 1.0351 (λ=0.2) to 1.1401 (λ=0.04), demonstrating the need for careful tuning.
Cross-validation consistently selected intermediate values of η and ζ, confirming that the model benefits most from a balanced fusion of ID and attribute information.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.