Skip to main content
QUICK REVIEW

[논문 리뷰] Continuous-time multi-armed bandits under random intervention times

Kei Noba, José Luis Pérez|arXiv (Cornell University)|2026. 03. 04.
Advanced Bandit Algorithms Research인용 수 0
한 줄 요약

연속시간 다팔 밴디트에서 무작위 갱신 시간에 대한 명시적 Gittins 지수 표현을 도출하고, Lévy 구동 팔과 지수 간 도착 케이스를 포함하며 Gittins 전략의 최적성을 입증한다.

ABSTRACT

This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of $J$ independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.

연구 동기 및 목표

  • Actions are taken at random times and arms stay active for random renewals를 Motivating
  • Provide an explicit Gittins index characterization for arms evolving as Lévy processes.
  • Derive explicit Gittins index expressions under exponential renewal times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion arms.
  • Show asymptotic and convergence results linking exponential-renewal indices to classical continuous-time indices.

제안 방법

  • Define a multi-armed bandit with J independent arms and arm-specific renewal times.
  • Formulate the discounted reward and the Gittins index as an optimal stopping problem for each arm.
  • Derive a general Gittins index expression for Lévy-driven arms using fluctuation theory.
  • Obtain explicit index formulas under exponential inter-arrival times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion processes via scale functions or diffusion characteristics.
  • Prove asymptotic behavior and convergence results, including weak convergence of measures mu^λ to mu^∞ and index convergence.

실험 결과

연구 질문

  • RQ1How can the Gittins index be explicitly characterized when arms follow general Lévy processes with random renewal times?
  • RQ2What are the explicit forms of the Gittins index under exponential renewal times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion arms?
  • RQ3How does the Gittins index behave as the renewal rate grows large, and does it converge to the classical continuous-time index?
  • RQ4Does the Gittins index policy remain optimal under arm-dependent renewal times?
  • RQ5How can fluctuation theory and scale functions be leveraged to compute the index in these settings?

주요 결과

  • The Gittins index strategy is optimal for the continuous-time bandit with random intervention times.
  • An explicit Gittins index characterization is obtained for general Lévy-process arms.
  • With exponential renewal times, closed-form index expressions are derived for spectrally negative, reflected spectrally negative, and diffusion arms in terms of scale functions or diffusion data.
  • The index converges to the continuous-time limit as the exponential rate increases, via a weak convergence result mu^λ ⇒ mu^∞.
  • Asymptotic behavior shows the index tends to the reward function as the renewal rate tends to zero.
  • Numerical experiments are provided to support the theoretical results.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.