Skip to main content
QUICK REVIEW

[论文解读] LIBRA: Language Model Informed Bandit Recourse Algorithm for Personalized Treatment Planning

Junyu Cao, Ruijiang Gao|arXiv (Cornell University)|Jan 17, 2026
Advanced Bandit Algorithms Research被引用 0
一句话总结

tldr: LIBRA integrates large language models with recourse-aware bandits to enable personalized treatment planning with minimal, actionable feature changes and theoretical guarantees.

ABSTRACT

We introduce a unified framework that seamlessly integrates algorithmic recourse, contextual bandits, and large language models (LLMs) to support sequential decision-making in high-stakes settings such as personalized medicine. We first introduce the recourse bandit problem, where a decision-maker must select both a treatment action and a feasible, minimal modification to mutable patient features. To address this problem, we develop the Generalized Linear Recourse Bandit (GLRB) algorithm. Building on this foundation, we propose LIBRA, a Language Model-Informed Bandit Recourse Algorithm that strategically combines domain knowledge from LLMs with the statistical rigor of bandit learning. LIBRA offers three key guarantees: (i) a warm-start guarantee, showing that LIBRA significantly reduces initial regret when LLM recommendations are near-optimal; (ii) an LLM-effort guarantee, proving that the algorithm consults the LLM only $O(\log^2 T)$ times, where $T$ is the time horizon, ensuring long-term autonomy; and (iii) a robustness guarantee, showing that LIBRA never performs worse than a pure bandit algorithm even when the LLM is unreliable. We further establish matching lower bounds that characterize the fundamental difficulty of the recourse bandit problem and demonstrate the near-optimality of our algorithms. Experiments on synthetic environments and a real hypertension-management case study confirm that GLRB and LIBRA improve regret, treatment quality, and sample efficiency compared with standard contextual bandits and LLM-only benchmarks. Our results highlight the promise of recourse-aware, LLM-assisted bandit algorithms for trustworthy LLM-bandits collaboration in personalized high-stakes decision-making.

研究动机与目标

  • Motivate recourse-aware sequential decision-making in high-stakes settings like personalized medicine.
  • Formulate the recourse bandit problem and develop GLRB to learn treatments plus minimal feasible feature changes.
  • Introduce LIBRA to combine LLM guidance with online bandit learning for improved early performance and autonomous learning over time.
  • Provide theoretical guarantees and lower bounds for recourse regret and algorithmic optimality.
  • Validate via synthetic experiments and a hypertension management case study.

提出的方法

  • Define a recourse bandit problem with immutable features xI and mutable features xM and actions A.
  • Develop Generalized Linear Recourse Bandit (GLRB) to learn parameters and provide recourse under a GLM with sub-Gaussian noise.
  • Formulate optimistic recourse optimization (ORO-Arm) to select recourse and action within uncertainty sets, solving with a two-block coordinate descent when necessary.
  • Prove high-probability uncertainty sets for θa*, and establish convergence of the optimization procedure via KL-property arguments.
  • Present LIBRA as a collaboration between LLMs and bandits, with warm-start benefits, limited LLM queries O(log^2 T), and robustness when the LLM is unreliable.
  • Provide lower bounds on recourse regret and show near-optimality of the proposed algorithms.

实验结果

研究问题

  • RQ1How to design a sequential decision framework that couples treatment choices with minimal feasible recourse adjustments?
  • RQ2Can an LLM provide useful warm-start guidance for online bandit learning while preserving sublinear regret?
  • RQ3What are the guarantees (warm-start, LLM-effort, robustness) of LIBRA in recourse-aware bandit settings?
  • RQ4What are the fundamental lower bounds for recourse bandits and do GLRB and LIBRA achieve near-optimal regret?
  • RQ5Do GLRB and LIBRA demonstrate improved regret, treatment quality, and sample efficiency compared with standard linear contextual bandits and LLM-only baselines in synthetic and real data?

主要发现

  • GLRB achieves a recourse regret bound of roughly Õ(d√KT) under a generalized linear model.
  • LIBRA offers warm-start, LLM-effort, and robustness guarantees, plus matching lower bounds showing near-optimality.
  • LIBRA reduces initial regret when LLM recommendations are near-optimal and consults the LLM only O(log^2 T) times.
  • Experiments on synthetic environments and a hypertension case study show improvements over LinUCB and LLM-only benchmarks in regret, treatment quality, and sample efficiency.
  • GLRB and LIBRA outperform standard contextual bandits and LLM-only baselines in both synthetic and clinical data.
  • LIBRA enables recourse-aware, trustworthy LLM-bandit collaboration for personalized high-stakes decision-making.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。