QUICK REVIEW

[논문 리뷰] Continuous-time multi-armed bandits under random intervention times

Kei Noba, José Luis Pérez|arXiv (Cornell University)|2026. 03. 04.

Advanced Bandit Algorithms Research인용 수 0

한 줄 요약

연속시간 다팔 밴디트에서 무작위 갱신 시간에 대한 명시적 Gittins 지수 표현을 도출하고, Lévy 구동 팔과 지수 간 도착 케이스를 포함하며 Gittins 전략의 최적성을 입증한다.

ABSTRACT

This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of $J$ independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.

연구 동기 및 목표

Actions are taken at random times and arms stay active for random renewals를 Motivating
Provide an explicit Gittins index characterization for arms evolving as Lévy processes.
Derive explicit Gittins index expressions under exponential renewal times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion arms.
Show asymptotic and convergence results linking exponential-renewal indices to classical continuous-time indices.

제안 방법

Define a multi-armed bandit with J independent arms and arm-specific renewal times.
Formulate the discounted reward and the Gittins index as an optimal stopping problem for each arm.
Derive a general Gittins index expression for Lévy-driven arms using fluctuation theory.
Obtain explicit index formulas under exponential inter-arrival times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion processes via scale functions or diffusion characteristics.
Prove asymptotic behavior and convergence results, including weak convergence of measures mu^λ to mu^∞ and index convergence.

실험 결과

연구 질문

RQ1How can the Gittins index be explicitly characterized when arms follow general Lévy processes with random renewal times?
RQ2What are the explicit forms of the Gittins index under exponential renewal times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion arms?
RQ3How does the Gittins index behave as the renewal rate grows large, and does it converge to the classical continuous-time index?
RQ4Does the Gittins index policy remain optimal under arm-dependent renewal times?
RQ5How can fluctuation theory and scale functions be leveraged to compute the index in these settings?

주요 결과

The Gittins index strategy is optimal for the continuous-time bandit with random intervention times.
An explicit Gittins index characterization is obtained for general Lévy-process arms.
With exponential renewal times, closed-form index expressions are derived for spectrally negative, reflected spectrally negative, and diffusion arms in terms of scale functions or diffusion data.
The index converges to the continuous-time limit as the exponential rate increases, via a weak convergence result mu^λ ⇒ mu^∞.
Asymptotic behavior shows the index tends to the reward function as the renewal rate tends to zero.
Numerical experiments are provided to support the theoretical results.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.