[논문 리뷰] Continuous-time multi-armed bandits under random intervention times
연속시간 다팔 밴디트에서 무작위 갱신 시간에 대한 명시적 Gittins 지수 표현을 도출하고, Lévy 구동 팔과 지수 간 도착 케이스를 포함하며 Gittins 전략의 최적성을 입증한다.
This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of $J$ independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.
연구 동기 및 목표
- Actions are taken at random times and arms stay active for random renewals를 Motivating
- Provide an explicit Gittins index characterization for arms evolving as Lévy processes.
- Derive explicit Gittins index expressions under exponential renewal times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion arms.
- Show asymptotic and convergence results linking exponential-renewal indices to classical continuous-time indices.
제안 방법
- Define a multi-armed bandit with J independent arms and arm-specific renewal times.
- Formulate the discounted reward and the Gittins index as an optimal stopping problem for each arm.
- Derive a general Gittins index expression for Lévy-driven arms using fluctuation theory.
- Obtain explicit index formulas under exponential inter-arrival times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion processes via scale functions or diffusion characteristics.
- Prove asymptotic behavior and convergence results, including weak convergence of measures mu^λ to mu^∞ and index convergence.
실험 결과
연구 질문
- RQ1How can the Gittins index be explicitly characterized when arms follow general Lévy processes with random renewal times?
- RQ2What are the explicit forms of the Gittins index under exponential renewal times for spectrally negative Lévy, reflected spectrally negative Lévy, and diffusion arms?
- RQ3How does the Gittins index behave as the renewal rate grows large, and does it converge to the classical continuous-time index?
- RQ4Does the Gittins index policy remain optimal under arm-dependent renewal times?
- RQ5How can fluctuation theory and scale functions be leveraged to compute the index in these settings?
주요 결과
- The Gittins index strategy is optimal for the continuous-time bandit with random intervention times.
- An explicit Gittins index characterization is obtained for general Lévy-process arms.
- With exponential renewal times, closed-form index expressions are derived for spectrally negative, reflected spectrally negative, and diffusion arms in terms of scale functions or diffusion data.
- The index converges to the continuous-time limit as the exponential rate increases, via a weak convergence result mu^λ ⇒ mu^∞.
- Asymptotic behavior shows the index tends to the reward function as the renewal rate tends to zero.
- Numerical experiments are provided to support the theoretical results.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.