[論文レビュー] CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation
CARD introduces a hierarchical personalization framework combining cluster-level LoRA adapters with decoding-time user-specific logit modulation, achieving strong personalization with improved efficiency and scalability.
Adapting large language models to individual users remains challenging due to the tension between fine-grained personalization and scalable deployment. We present CARD, a hierarchical framework that achieves effective personalization through progressive refinement. CARD first clusters users according to shared stylistic patterns and learns cluster-specific LoRA adapters, enabling robust generalization and strong low-resource performance. To capture individual differences within each cluster, we propose an implicit preference learning mechanism that contrasts user-authored text with cluster-level generations, allowing the model to infer user-specific style preferences without manual annotation. At inference time, CARD injects personalization exclusively at decoding via lightweight user preference vectors and low-rank logit corrections, while keeping the base model frozen. Experiments on the LaMP and LongLaMP benchmarks show that CARD achieves competitive or superior generation quality compared to state-of-the-art baselines, while significantly improving efficiency and scalability for practical personalized text generation.
研究の動機と目的
- Motivate efficient, scalable personalization of large language models (LLMs) without per-user fine-tuning or heavy retrieval.
- Propose a two-stage framework that leverages cluster-level adaptation for group priors and decoding-time user-specific modulation for individual differences.
- Develop an implicit preference learning mechanism to capture user-specific styles using input-aligned preference pairs (user vs. cluster baseline).
- Enable inference-time personalization by freezing backbone and cluster adapters while injecting lightweight user preference signals at decoding via reward-guided logit editing.
提案手法
- Cluster-level adaptation with LoRA: learn cluster-specific LoRA adapters for K user clusters to capture shared stylistic patterns.
- Preference-pair construction: create input-aligned pairs by contrasting a user’s output with the cluster-generated baseline to obtain stable supervision.
- Decoding-time personalization: learn a compact user preference vector and a vocabulary projection to perturb logits at inference without updating the backbone or cluster adapters.
- Reward-guided logit modification: apply a low-rank, Top-k constrained logit adjustment driven by user-specific signals to steer generation toward personalized style.
- Training objective: use a Bradley–Terry pairwise loss on constructed pairs to optimize decoding-time personalization parameters.
- New-user adaptation: assign new users to clusters and estimate their preference vectors from limited history, keeping heavy model components frozen.
実験結果
リサーチクエスチョン
- RQ1RQ1: How does CARD perform compared to existing personalization baselines under standard evaluation settings?
- RQ2RQ2: How do group LoRA and user vectors contribute to personalization performance?
- RQ3RQ3: How effective is CARD for low-resource users with limited historical data?
- RQ4RQ4: How do design choices in user vector construction and clustering affect performance?
- RQ5RQ5: Do LLM judgments align with human judgments in assessing personalization?
- RQ6RQ6: Is CARD efficient and scalable for large-scale user deployment?
主な発見
| Task | Metric | Non-pers. | RAG | PAG | PAD | PPLUG | OPPU | CARD |
|---|---|---|---|---|---|---|---|---|
| LaMP4: News Headline Gen. | R-1 | 0.146 | 0.166 | 0.164 | 0.158 | 0.157 | 0.152 | 0.218 |
| LaMP4: News Headline Gen. | R-L | 0.128 | 0.148 | 0.146 | 0.139 | 0.138 | 0.128 | 0.195 |
| LaMP5: Scholarly Title Gen. | R-1 | 0.425 | 0.456 | 0.415 | 0.442 | 0.464 | 0.426 | 0.459 |
| LaMP5: Scholarly Title Gen. | R-L | 0.342 | 0.372 | 0.352 | 0.360 | 0.386 | 0.342 | 0.387 |
| LaMP7: Tweet Paraphrasing | R-1 | 0.497 | 0.500 | 0.507 | 0.502 | 0.511 | 0.498 | 0.521 |
| LaMP7: Tweet Paraphrasing | R-L | 0.439 | 0.431 | 0.435 | 0.437 | 0.433 | 0.422 | 0.448 |
| LongLaMP1: Abstract Gen. | R-1 | 0.331 | 0.372 | 0.381 | 0.355 | 0.391 | 0.382 | 0.411 |
| LongLaMP1: Abstract Gen. | R-L | 0.184 | 0.203 | 0.201 | 0.194 | 0.217 | 0.202 | 0.316 |
| LongLaMP2: Topic Writing | R-1 | 0.247 | 0.244 | 0.255 | 0.248 | 0.243 | 0.245 | 0.252 |
| LongLaMP2: Topic Writing | R-L | 0.119 | 0.118 | 0.125 | 0.121 | 0.122 | 0.112 | 0.127 |
| LongLaMP3: Product Review Writing | R-1 | 0.292 | 0.382 | 0.322 | 0.308 | 0.396 | 0.295 | 0.405 |
| LongLaMP3: Product Review Writing | R-L | 0.130 | 0.152 | 0.141 | 0.136 | 0.149 | 0.132 | 0.156 |
- CARD achieves best or near-best generation quality across multiple tasks and settings, ranking 1st in 10 of 12 settings and near-best in the remaining two.
- Ablation studies show user vectors are the primary driver of personalization, with group-level LoRA providing important but smaller gains.
- CARD maintains strong performance in low-resource scenarios, outperforming non-personalized baselines with limited history.
- Moderate user-vector strength and intermediate dimensionality yield the best personalization balance, while too strong or too large vectors harm performance.
- LLM-based judgments favor CARD, and human judgments often prefer CARD over references in stylistic and user-fit dimensions.
- CARD demonstrates favorable efficiency, with training and inference costs largely independent of per-user model updates and showing scalable deployment characteristics.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。