QUICK REVIEW

[논문 리뷰] Uncertainty and Fairness Awareness in LLM-Based Recommendation Systems

Chandan Kumar Sah, Xiaoli Lian|arXiv (Cornell University)|2026. 01. 31.

Ethics and Social Impacts of AI인용 수 0

한 줄 요약

논문은 예측 불확실성 및 인구통계/개인성 편향이 LLM 기반 추천에 미치는 영향을 분석하고, Gemini 1.5를 사용한 불확실성 인식 공정성 벤치마크를 도입하며, 개인성 인식 공정성 프레임워크를 제안한다.

ABSTRACT

Large language models (LLMs) enable powerful zero-shot recommendations by leveraging broad contextual knowledge, yet predictive uncertainty and embedded biases threaten reliability and fairness. This paper studies how uncertainty and fairness evaluations affect the accuracy, consistency, and trustworthiness of LLM-generated recommendations. We introduce a benchmark of curated metrics and a dataset annotated for eight demographic attributes (31 categorical values) across two domains: movies and music. Through in-depth case studies, we quantify predictive uncertainty (via entropy) and demonstrate that Google DeepMind's Gemini 1.5 Flash exhibits systematic unfairness for certain sensitive attributes; measured similarity-based gaps are SNSR at 0.1363 and SNSV at 0.0507. These disparities persist under prompt perturbations such as typographical errors and multilingual inputs. We further integrate personality-aware fairness into the RecLLM evaluation pipeline to reveal personality-linked bias patterns and expose trade-offs between personalization and group fairness. We propose a novel uncertainty-aware evaluation methodology for RecLLMs, present empirical insights from deep uncertainty case studies, and introduce a personality profile-informed fairness benchmark that advances explainability and equity in LLM recommendations. Together, these contributions establish a foundation for safer, more interpretable RecLLMs and motivate future work on multi-model benchmarks and adaptive calibration for trustworthy deployment.

연구 동기 및 목표

RecLLMs의 신뢰성 및 공정성 보조 수단으로서 불확실성 정량화를 동기화한다.
프롬프트 변형과 인구통계 특성이 LLM 추천의 공정성에 미치는 영향을 조사한다.
RecLLMs에 대한 불확실성 인식 평가 프레임워크를 개발하고 적용한다.
개인성 조건 프롬프트를 도입하여 편향 패턴을 연구한다.
LLM 기반 추천의 설명가능성과 형평성을 개선하기 위한 벤치마크와 방법을 제안한다.]
method=[
LLM 기반 순위 출력의 엔트로피를 사용하여 예측 불확실성을 정량화한다.
영화와 음악 전반에 걸친 8개 인구통계 속성(31값)을 갖는 큐레이션 데이터세트를 구성한다.
출력의 가변성을 측정하기 위해 인구통계 및 개인성 신호를 포함한 공정성 프롬프트를 설계한다.
중립적 및 민감한 프롬프트에서 Gemini 1.5 Flash의 공정성과 불확실성을 평가한다.
개인성 프롬프트에 대한 PA 공정 점수(PAWS)와 함께 비슷도 기반의 불공정성 지표 SNSR 및 SNSV를 계산한다.
오타, 다국어 프롬프트와 같은 프롬프트 섭동에 대한 강건성을 분석하고 도메인별 편향을 보고한다.]
research_questions=[
RQ1: 예측 불확실성(엔트로피)이 LLM 기반 추천의 신뢰성에 어떤 영향을 미치는가?
RQ2: 다중 속성 인구통계 및 프롬프트 섭동에 대한 LLM 추천의 공정성 격차는 얼마나 강건한가?
RQ3: 개인성 인식 프롬 prompting 편향 패턴 및 개인화와 집단 형평성 간의 트레이드오프를 어떻게 드러내는가?]
key_findings=[
더 높은 예측 엔트로피는 덜 신뢰할 만한 추천과 관련이 있다.
Gemini는 음악 및 영화 도메인에서 여러 민감 속성에 대해 체계적 불공정성을 보이며, SNSR 및 SNSV로 격차를 정량화한다(예: 표 3에 제시된 SNSR/SNSV 값).
오타 및 다국어 프롬프트와 같은 프롬프트 섭동에서도 불공정성 패턴이 지속된다.
개인성 인식 프롬 prompts는 편향 패턴을 드러내고 개인화와 그룹 형평성 간의 트레이드오프를 강조한다.
제안된 불확실성 인식 평가 프레임워크는 더 강건하고 해석 가능한 공정성 평가를 생성한다.
격차는 도메인 및 속성에 따라 특이하며, 종교, 대륙, 직업, 국가가 자주 영향을 받는 속성 중 하나이다.

제안 방법

테스트 프롬프트에서의 오타나 다국어 프롬프트 등 프롬프트 섭동에 대한 강건성 분석을 포함한다.

Figure 1: Illustrates how uncertainty in deep learning models affects recommendation reliability, using probability estimates and explanations to highlight challenges in recognizing unfamiliar inputs.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.