QUICK REVIEW

[論文レビュー] CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation

Yutong Song, Jiang Wu|arXiv (Cornell University)|Jan 9, 2026

Topic Modeling被引用数 0

ひとこと要約

CARD introduces a hierarchical personalization framework combining cluster-level LoRA adapters with decoding-time user-specific logit modulation, achieving strong personalization with improved efficiency and scalability.

ABSTRACT

Adapting large language models to individual users remains challenging due to the tension between fine-grained personalization and scalable deployment. We present CARD, a hierarchical framework that achieves effective personalization through progressive refinement. CARD first clusters users according to shared stylistic patterns and learns cluster-specific LoRA adapters, enabling robust generalization and strong low-resource performance. To capture individual differences within each cluster, we propose an implicit preference learning mechanism that contrasts user-authored text with cluster-level generations, allowing the model to infer user-specific style preferences without manual annotation. At inference time, CARD injects personalization exclusively at decoding via lightweight user preference vectors and low-rank logit corrections, while keeping the base model frozen. Experiments on the LaMP and LongLaMP benchmarks show that CARD achieves competitive or superior generation quality compared to state-of-the-art baselines, while significantly improving efficiency and scalability for practical personalized text generation.

研究の動機と目的

Motivate efficient, scalable personalization of large language models (LLMs) without per-user fine-tuning or heavy retrieval.
Propose a two-stage framework that leverages cluster-level adaptation for group priors and decoding-time user-specific modulation for individual differences.
Develop an implicit preference learning mechanism to capture user-specific styles using input-aligned preference pairs (user vs. cluster baseline).
Enable inference-time personalization by freezing backbone and cluster adapters while injecting lightweight user preference signals at decoding via reward-guided logit editing.

提案手法

Cluster-level adaptation with LoRA: learn cluster-specific LoRA adapters for K user clusters to capture shared stylistic patterns.
Preference-pair construction: create input-aligned pairs by contrasting a user’s output with the cluster-generated baseline to obtain stable supervision.
Decoding-time personalization: learn a compact user preference vector and a vocabulary projection to perturb logits at inference without updating the backbone or cluster adapters.
Reward-guided logit modification: apply a low-rank, Top-k constrained logit adjustment driven by user-specific signals to steer generation toward personalized style.
Training objective: use a Bradley–Terry pairwise loss on constructed pairs to optimize decoding-time personalization parameters.
New-user adaptation: assign new users to clusters and estimate their preference vectors from limited history, keeping heavy model components frozen.

実験結果

リサーチクエスチョン

RQ1RQ1: How does CARD perform compared to existing personalization baselines under standard evaluation settings?
RQ2RQ2: How do group LoRA and user vectors contribute to personalization performance?
RQ3RQ3: How effective is CARD for low-resource users with limited historical data?
RQ4RQ4: How do design choices in user vector construction and clustering affect performance?
RQ5RQ5: Do LLM judgments align with human judgments in assessing personalization?
RQ6RQ6: Is CARD efficient and scalable for large-scale user deployment?

主な発見

Task	Metric	Non-pers.	RAG	PAG	PAD	PPLUG	OPPU	CARD
LaMP4: News Headline Gen.	R-1	0.146	0.166	0.164	0.158	0.157	0.152	0.218
LaMP4: News Headline Gen.	R-L	0.128	0.148	0.146	0.139	0.138	0.128	0.195
LaMP5: Scholarly Title Gen.	R-1	0.425	0.456	0.415	0.442	0.464	0.426	0.459
LaMP5: Scholarly Title Gen.	R-L	0.342	0.372	0.352	0.360	0.386	0.342	0.387
LaMP7: Tweet Paraphrasing	R-1	0.497	0.500	0.507	0.502	0.511	0.498	0.521
LaMP7: Tweet Paraphrasing	R-L	0.439	0.431	0.435	0.437	0.433	0.422	0.448
LongLaMP1: Abstract Gen.	R-1	0.331	0.372	0.381	0.355	0.391	0.382	0.411
LongLaMP1: Abstract Gen.	R-L	0.184	0.203	0.201	0.194	0.217	0.202	0.316
LongLaMP2: Topic Writing	R-1	0.247	0.244	0.255	0.248	0.243	0.245	0.252
LongLaMP2: Topic Writing	R-L	0.119	0.118	0.125	0.121	0.122	0.112	0.127
LongLaMP3: Product Review Writing	R-1	0.292	0.382	0.322	0.308	0.396	0.295	0.405
LongLaMP3: Product Review Writing	R-L	0.130	0.152	0.141	0.136	0.149	0.132	0.156

CARD achieves best or near-best generation quality across multiple tasks and settings, ranking 1st in 10 of 12 settings and near-best in the remaining two.
Ablation studies show user vectors are the primary driver of personalization, with group-level LoRA providing important but smaller gains.
CARD maintains strong performance in low-resource scenarios, outperforming non-personalized baselines with limited history.
Moderate user-vector strength and intermediate dimensionality yield the best personalization balance, while too strong or too large vectors harm performance.
LLM-based judgments favor CARD, and human judgments often prefer CARD over references in stylistic and user-fit dimensions.
CARD demonstrates favorable efficiency, with training and inference costs largely independent of per-user model updates and showing scalable deployment characteristics.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。