QUICK REVIEW

[论文解读] LIBRA: Language Model Informed Bandit Recourse Algorithm for Personalized Treatment Planning

Junyu Cao, Ruijiang Gao|arXiv (Cornell University)|Jan 17, 2026

Advanced Bandit Algorithms Research被引用 0

一句话总结

tldr: LIBRA integrates large language models with recourse-aware bandits to enable personalized treatment planning with minimal, actionable feature changes and theoretical guarantees.

ABSTRACT

We introduce a unified framework that seamlessly integrates algorithmic recourse, contextual bandits, and large language models (LLMs) to support sequential decision-making in high-stakes settings such as personalized medicine. We first introduce the recourse bandit problem, where a decision-maker must select both a treatment action and a feasible, minimal modification to mutable patient features. To address this problem, we develop the Generalized Linear Recourse Bandit (GLRB) algorithm. Building on this foundation, we propose LIBRA, a Language Model-Informed Bandit Recourse Algorithm that strategically combines domain knowledge from LLMs with the statistical rigor of bandit learning. LIBRA offers three key guarantees: (i) a warm-start guarantee, showing that LIBRA significantly reduces initial regret when LLM recommendations are near-optimal; (ii) an LLM-effort guarantee, proving that the algorithm consults the LLM only $O(\log^2 T)$ times, where $T$ is the time horizon, ensuring long-term autonomy; and (iii) a robustness guarantee, showing that LIBRA never performs worse than a pure bandit algorithm even when the LLM is unreliable. We further establish matching lower bounds that characterize the fundamental difficulty of the recourse bandit problem and demonstrate the near-optimality of our algorithms. Experiments on synthetic environments and a real hypertension-management case study confirm that GLRB and LIBRA improve regret, treatment quality, and sample efficiency compared with standard contextual bandits and LLM-only benchmarks. Our results highlight the promise of recourse-aware, LLM-assisted bandit algorithms for trustworthy LLM-bandits collaboration in personalized high-stakes decision-making.

研究动机与目标

Motivate recourse-aware sequential decision-making in high-stakes settings like personalized medicine.
Formulate the recourse bandit problem and develop GLRB to learn treatments plus minimal feasible feature changes.
Introduce LIBRA to combine LLM guidance with online bandit learning for improved early performance and autonomous learning over time.
Provide theoretical guarantees and lower bounds for recourse regret and algorithmic optimality.
Validate via synthetic experiments and a hypertension management case study.

提出的方法

Define a recourse bandit problem with immutable features xI and mutable features xM and actions A.
Develop Generalized Linear Recourse Bandit (GLRB) to learn parameters and provide recourse under a GLM with sub-Gaussian noise.
Formulate optimistic recourse optimization (ORO-Arm) to select recourse and action within uncertainty sets, solving with a two-block coordinate descent when necessary.
Prove high-probability uncertainty sets for θa*, and establish convergence of the optimization procedure via KL-property arguments.
Present LIBRA as a collaboration between LLMs and bandits, with warm-start benefits, limited LLM queries O(log^2 T), and robustness when the LLM is unreliable.
Provide lower bounds on recourse regret and show near-optimality of the proposed algorithms.

实验结果

研究问题

RQ1How to design a sequential decision framework that couples treatment choices with minimal feasible recourse adjustments?
RQ2Can an LLM provide useful warm-start guidance for online bandit learning while preserving sublinear regret?
RQ3What are the guarantees (warm-start, LLM-effort, robustness) of LIBRA in recourse-aware bandit settings?
RQ4What are the fundamental lower bounds for recourse bandits and do GLRB and LIBRA achieve near-optimal regret?
RQ5Do GLRB and LIBRA demonstrate improved regret, treatment quality, and sample efficiency compared with standard linear contextual bandits and LLM-only baselines in synthetic and real data?

主要发现

GLRB achieves a recourse regret bound of roughly Õ(d√KT) under a generalized linear model.
LIBRA offers warm-start, LLM-effort, and robustness guarantees, plus matching lower bounds showing near-optimality.
LIBRA reduces initial regret when LLM recommendations are near-optimal and consults the LLM only O(log^2 T) times.
Experiments on synthetic environments and a hypertension case study show improvements over LinUCB and LLM-only benchmarks in regret, treatment quality, and sample efficiency.
GLRB and LIBRA outperform standard contextual bandits and LLM-only baselines in both synthetic and clinical data.
LIBRA enables recourse-aware, trustworthy LLM-bandit collaboration for personalized high-stakes decision-making.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。