QUICK REVIEW

[논문 리뷰] LIBRA: Language Model Informed Bandit Recourse Algorithm for Personalized Treatment Planning

Junyu Cao, Ruijiang Gao|arXiv (Cornell University)|2026. 01. 17.

Advanced Bandit Algorithms Research인용 수 0

한 줄 요약

LIBRA는 recourse를 고려한 밴딧과 대형 언어 모델(LLM)을 통합하여 최소한의 실행 가능한 특성 변화와 이론적 보장을 제공하는 개인화 치료 계획을 가능하게 한다.

ABSTRACT

We introduce a unified framework that seamlessly integrates algorithmic recourse, contextual bandits, and large language models (LLMs) to support sequential decision-making in high-stakes settings such as personalized medicine. We first introduce the recourse bandit problem, where a decision-maker must select both a treatment action and a feasible, minimal modification to mutable patient features. To address this problem, we develop the Generalized Linear Recourse Bandit (GLRB) algorithm. Building on this foundation, we propose LIBRA, a Language Model-Informed Bandit Recourse Algorithm that strategically combines domain knowledge from LLMs with the statistical rigor of bandit learning. LIBRA offers three key guarantees: (i) a warm-start guarantee, showing that LIBRA significantly reduces initial regret when LLM recommendations are near-optimal; (ii) an LLM-effort guarantee, proving that the algorithm consults the LLM only $O(\log^2 T)$ times, where $T$ is the time horizon, ensuring long-term autonomy; and (iii) a robustness guarantee, showing that LIBRA never performs worse than a pure bandit algorithm even when the LLM is unreliable. We further establish matching lower bounds that characterize the fundamental difficulty of the recourse bandit problem and demonstrate the near-optimality of our algorithms. Experiments on synthetic environments and a real hypertension-management case study confirm that GLRB and LIBRA improve regret, treatment quality, and sample efficiency compared with standard contextual bandits and LLM-only benchmarks. Our results highlight the promise of recourse-aware, LLM-assisted bandit algorithms for trustworthy LLM-bandits collaboration in personalized high-stakes decision-making.

연구 동기 및 목표

개인 맞춤 의학과 같은 고위험 환경에서 recourse를 고려한 순차적 의사결정을 촉진한다.
recourse 밴딧 문제를 형식화하고 최소한의 실행 가능한 특성 변화와 함께 치료를 학습하는 GLRB를 개발한다.
LLM 가이던스를 온라인 밴딧 학습과 결합하여 초기 성능을 향상시키고 시간에 따라 자율 학습을 가능하게 하는 LIBRA를 도입한다.
recourse regret에 대한 이론적 보장 및 알고리즘 최적성의 하한을 제공한다.
합성 실험과 고혈압 관리 사례 연구를 통해 검증한다.

제안 방법

변경 불가능한 특징 xI와 변경 가능한 특징 xM 및 행동 A를 갖는 recourse 밴딧 문제를 정의한다.
일반화 선형 Recourse Bandit(GLRB)을 개발하여 GLM에서 매개변수를 학습하고 recourse를 제공한다(서브-가우시안 노이즈를 갖는 GLM).
불확실성 집합 내에서 recourse와 행동을 선택하기 위한 낙관적 recourse 최적화(ORO-Arm)를 형식화하고 필요 시 두 블록 좌표하강법으로 해결한다.
θa*에 대한 고확률 불확실성 집합을 보이고 KL-속성(KL-property) 주장에 의해 최적화 절차의 수렴을 정립한다.
LLMs와 밴딧 간의 협력으로서 LIBRA를 제시하고, 웜 스타트 이점, 제한된 LLM 질의 수 O(log^2 T), LLM이 신뢰할 수 없을 때의 강건성을 제공한다.
recourse regret의 하한을 제공하고 제안된 알고리즘의 근사 최적성을 보인다.

실험 결과

연구 질문

RQ1치료 선택과 최소한의 실행 가능한 recourse 조정을 연결하는 순차 의사결정 프레임워크를 어떻게 설계할 수 있는가?
RQ2온라인 밴딧 학습에 유용한 웜 스타트 지침을 LLM이 제공하면서 서브선형 후회를 유지할 수 있는가?
RQ3recourse-를 고려한 밴딧 설정에서 LIBRA의 보장(웜스타트, LLM-노력, 강건성)은 무엇인가?
RQ4recourse 밴딧에 대한 근본적인 하한은 무엇이며 GLRB와 LIBRA가 근사 최적의 후회를 달성하는가?
RQ5합성 및 실제 데이터에서 GLRB와 LIBRA가 표준 선형 컨텍스트 밴딧 및 LLM-단독 베이스라인에 비해 후회, 치료 품질 및 샘플 효율성을 개선하는가?

주요 결과

GLRB는 일반화 선형 모델하에서 recourse regret 경계가 대략 Õ(d√KT)이다.
LIBRA는 웜스타트, LLM-노력 및 강건성 보장을 제공하고, 근사 최적성을 시사하는 일치하는 하한도와 함께 제시한다.
LIBRA는 초기 후회를 줄이고 LLM을 O(log^2 T)번만 조회한다.
합성 환경과 고혈압 사례 연구에서 LinUCB 및 LLM-단독 벤치마크에 비해 후회, 치료 품질 및 샘플 효율성에서 개선을 보였다.
GLRB와 LIBRA는 합성 및 임상 데이터 모두에서 표준 맥락적 밴딧 및 LLM-단독 베이스라인을 능가한다.
LIBRA는 개인 맞춤 고위험 의사결정을 위한 recourse를 고려한 신뢰할 수 있는 LLM-밴딧 협업을 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.