Skip to main content
QUICK REVIEW

[論文レビュー] CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation

Yutong Song, Jiang Wu|arXiv (Cornell University)|Jan 9, 2026
Topic Modeling被引用数 0
ひとこと要約

CARD introduces a hierarchical personalization framework combining cluster-level LoRA adapters with decoding-time user-specific logit modulation, achieving strong personalization with improved efficiency and scalability.

ABSTRACT

Adapting large language models to individual users remains challenging due to the tension between fine-grained personalization and scalable deployment. We present CARD, a hierarchical framework that achieves effective personalization through progressive refinement. CARD first clusters users according to shared stylistic patterns and learns cluster-specific LoRA adapters, enabling robust generalization and strong low-resource performance. To capture individual differences within each cluster, we propose an implicit preference learning mechanism that contrasts user-authored text with cluster-level generations, allowing the model to infer user-specific style preferences without manual annotation. At inference time, CARD injects personalization exclusively at decoding via lightweight user preference vectors and low-rank logit corrections, while keeping the base model frozen. Experiments on the LaMP and LongLaMP benchmarks show that CARD achieves competitive or superior generation quality compared to state-of-the-art baselines, while significantly improving efficiency and scalability for practical personalized text generation.

研究の動機と目的

  • Motivate efficient, scalable personalization of large language models (LLMs) without per-user fine-tuning or heavy retrieval.
  • Propose a two-stage framework that leverages cluster-level adaptation for group priors and decoding-time user-specific modulation for individual differences.
  • Develop an implicit preference learning mechanism to capture user-specific styles using input-aligned preference pairs (user vs. cluster baseline).
  • Enable inference-time personalization by freezing backbone and cluster adapters while injecting lightweight user preference signals at decoding via reward-guided logit editing.

提案手法

  • Cluster-level adaptation with LoRA: learn cluster-specific LoRA adapters for K user clusters to capture shared stylistic patterns.
  • Preference-pair construction: create input-aligned pairs by contrasting a user’s output with the cluster-generated baseline to obtain stable supervision.
  • Decoding-time personalization: learn a compact user preference vector and a vocabulary projection to perturb logits at inference without updating the backbone or cluster adapters.
  • Reward-guided logit modification: apply a low-rank, Top-k constrained logit adjustment driven by user-specific signals to steer generation toward personalized style.
  • Training objective: use a Bradley–Terry pairwise loss on constructed pairs to optimize decoding-time personalization parameters.
  • New-user adaptation: assign new users to clusters and estimate their preference vectors from limited history, keeping heavy model components frozen.

実験結果

リサーチクエスチョン

  • RQ1RQ1: How does CARD perform compared to existing personalization baselines under standard evaluation settings?
  • RQ2RQ2: How do group LoRA and user vectors contribute to personalization performance?
  • RQ3RQ3: How effective is CARD for low-resource users with limited historical data?
  • RQ4RQ4: How do design choices in user vector construction and clustering affect performance?
  • RQ5RQ5: Do LLM judgments align with human judgments in assessing personalization?
  • RQ6RQ6: Is CARD efficient and scalable for large-scale user deployment?

主な発見

TaskMetricNon-pers.RAGPAGPADPPLUGOPPUCARD
LaMP4: News Headline Gen.R-10.1460.1660.1640.1580.1570.1520.218
LaMP4: News Headline Gen.R-L0.1280.1480.1460.1390.1380.1280.195
LaMP5: Scholarly Title Gen.R-10.4250.4560.4150.4420.4640.4260.459
LaMP5: Scholarly Title Gen.R-L0.3420.3720.3520.3600.3860.3420.387
LaMP7: Tweet ParaphrasingR-10.4970.5000.5070.5020.5110.4980.521
LaMP7: Tweet ParaphrasingR-L0.4390.4310.4350.4370.4330.4220.448
LongLaMP1: Abstract Gen.R-10.3310.3720.3810.3550.3910.3820.411
LongLaMP1: Abstract Gen.R-L0.1840.2030.2010.1940.2170.2020.316
LongLaMP2: Topic WritingR-10.2470.2440.2550.2480.2430.2450.252
LongLaMP2: Topic WritingR-L0.1190.1180.1250.1210.1220.1120.127
LongLaMP3: Product Review WritingR-10.2920.3820.3220.3080.3960.2950.405
LongLaMP3: Product Review WritingR-L0.1300.1520.1410.1360.1490.1320.156
  • CARD achieves best or near-best generation quality across multiple tasks and settings, ranking 1st in 10 of 12 settings and near-best in the remaining two.
  • Ablation studies show user vectors are the primary driver of personalization, with group-level LoRA providing important but smaller gains.
  • CARD maintains strong performance in low-resource scenarios, outperforming non-personalized baselines with limited history.
  • Moderate user-vector strength and intermediate dimensionality yield the best personalization balance, while too strong or too large vectors harm performance.
  • LLM-based judgments favor CARD, and human judgments often prefer CARD over references in stylistic and user-fit dimensions.
  • CARD demonstrates favorable efficiency, with training and inference costs largely independent of per-user model updates and showing scalable deployment characteristics.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。