QUICK REVIEW

[Paper Review] The Impact of Popularity Bias on Fairness and Calibration in Recommendation

Himan Abdollahpouri, Masoud Mansoury|arXiv (Cornell University)|Oct 13, 2019

Recommender Systems and Techniques20 references19 citations

TL;DR

This paper investigates how popularity bias in recommendation algorithms leads to miscalibration—where recommendations deviate from users' true preferences—particularly affecting users less interested in popular items. It demonstrates a strong correlation between algorithmic popularity bias (measured as popularity lift) and increased miscalibration, with neighborhood-based algorithms showing higher bias and miscalibration than factorization-based methods.

ABSTRACT

Recently there has been a growing interest in fairness-aware recommender systems, including fairness in providing consistent performance across different users or groups of users. A recommender system could be considered unfair if the recommendations do not fairly represent the tastes of a certain group of users while other groups receive recommendations that are consistent with their preferences. In this paper, we use a metric called miscalibration for measuring how a recommendation algorithm is responsive to users' true preferences and we consider how various algorithms may result in different degrees of miscalibration. A well-known type of bias in recommendation is popularity bias where few popular items are over-represented in recommendations, while the majority of other items do not get significant exposure. We conjecture that popularity bias is one important factor leading to miscalibration in recommendation. Our experimental results using two real-world datasets show that there is a strong correlation between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items. Moreover, we show algorithms with greater popularity bias amplification tend to have greater miscalibration.

Motivation & Objective

To investigate the impact of popularity bias on fairness and calibration in recommender systems.
To examine whether users with lower interest in popular items are disproportionately affected by algorithmic popularity bias.
To analyze the relationship between popularity lift and miscalibration across different recommendation algorithms.
To compare the performance of various algorithms in terms of fairness and calibration, particularly focusing on user group differences.
To explore whether popularity bias is a root cause of miscalibration in recommendation systems.

Proposed method

Measured popularity bias using the popularity lift metric, defined as the ratio of average item popularity in recommendations to input popularity.
Quantified miscalibration as the deviation between the distribution of genres in a user's rating history and the distribution in their recommendations.
Evaluated multiple recommendation algorithms (e.g., ItemKNN, UserKNN, SVD++, BMF, Most-Popular) on the MovieLens dataset.
Grouped users by their interest in popular items (e.g., men vs. women) to assess differential impacts of popularity bias.
Used statistical significance testing (p < 0.05) to compare miscalibration and popularity lift across user groups and algorithms.
Analyzed the correlation between popularity lift and miscalibration across different algorithms and user segments.

Experimental results

Research questions

RQ1How does algorithmic popularity bias affect miscalibration in recommendations for different user groups?
RQ2Are users with lower interest in popular items more affected by popularity bias than those with higher interest?
RQ3Is there a significant correlation between popularity lift and overall miscalibration across different recommendation algorithms?
RQ4Do certain types of algorithms (e.g., neighborhood-based vs. factorization-based) amplify popularity bias and miscalibration more than others?
RQ5To what extent does popularity bias contribute to unfair treatment in recommendation systems, as measured by miscalibration differences across groups?

Key findings

Users with lower interest in popular items experience significantly higher popularity lift, indicating greater exposure to algorithmic popularity bias.
The group with lower interest in popular items (women in the MovieLens dataset) had a higher popularity lift (1.91) compared to men (1.76), showing greater bias amplification.
This same group also exhibited higher miscalibration (0.48 vs. 0.42 for men), indicating recommendations were less aligned with their true preferences.
There is a strong positive correlation between popularity lift and miscalibration: algorithms with higher popularity lift (e.g., Most-Popular with 1.91) showed higher miscalibration (0.48).
Neighborhood-based algorithms (e.g., ItemKNN, UserKNN) had higher popularity lift and miscalibration compared to factorization-based methods like SVD++ and BMF.
SVD++ and BMF showed the lowest popularity lift (0.33 and 0.87, respectively) and were the most calibrated, indicating greater resistance to popularity bias.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.