QUICK REVIEW

[논문 리뷰] Batched Kernelized Bandits: Refinements and Extensions

Chenkai Ma, Keqin Chen|arXiv (Cornell University)|2026. 03. 13.

Advanced Bandit Algorithms Research인용 수 0

한 줄 요약

이 논문은 배치된 커널화 밴디트를 배치 수를 최적화하고, 적응형 배치에 대한 하한을 확립하며, 비강건적 후회 경계와 일치하면서 간단한 후회가 개선되는 견고한 변형을 도입한다.

ABSTRACT

In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that $B=O(\log\log T)$ batches suffice to attain near-optimal regret, where $T$ is the time horizon and $B$ is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within $1+o(1)$), and (ii) removing a factor of $B$ in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.

연구 동기 및 목표

배치 피드백 하에서 후회를 줄이는 데 초점을 맞춘 노이즈가 있는 RKHS-경계 함수들을 다루는 배치형 블랙박스 최적화를 조사한다.
정밀한 상수를 가진 최적의 배치 수를 결정하고 후회 경계에서 불필요한 인자를 제거한다.
최소극 한계(minimax limits)를 평가하기 위한 적응형 배치의 하한을 개발한다.
교란이 적대적으로 경계되는 견고한 설정으로 확장하되 후회 성능을 보존한다.

제안 방법

일반화된 배치 크기로 작동하도록 Batched Pure Exploration (BPE) 알고리즘을 분석하고 다듬는다.
수정된 배치 크기 스케줄 N_i = min{ceil(T^{1-a^i}), T - sum_{j<i} N_j}를 사용하여 증가하는 배치 수에 대한 상한을 도출하고 근사 최적의 후회 O*(sqrt(T gamma_T))를 증명한다.
무한 팔 커널화 밴디트에 맞춘 변경-측정(change-measure) 주장을 통해 적응형 배치의 하한을 확립한다.
perturbation-robust 후보 집합으로 탐색을 확장하고 누적 후회 경계를 증명하여 robust-BPE를 도입한다.
SE 및 Matérn 커널에 대한 정보 이득 gamma_T에 대한 기존 결과를 비교하고 확장하며, 적응형 배치를 처리한다.
간단한 후회 함의와 적대적 교란에 대한 견고성에 대한 고수준 논의를 제공한다.

Figure 1 : Illustration of a class of hard-to-distinguish functions $\mathcal{F}$ , where any $x\in\mathcal{X}$ can be $\epsilon$ -optimal for at most one bump function. This is an “idealized” illustration, with the actual functions used having infinite support but steady decay to zero.

실험 결과

연구 질문

RQ1배치된 커널화 밴디트에서 근사 최적의 후회를 달성하기 위해 필요한 최적의 배치 수(1+o(1)까지)는 무엇인가?
RQ2이 설정에서 적응형 배치가 고정 배치에 비해 어떤 minimax 이점을 제공하는가?
RQ3배치 프레임워크를 후회 성능을 해치지 않으면서 적대적으로 견고한 목표로 확장할 수 있는가?
RQ4정제된 배치 크기 스케줄이 SE 및 Matérn 커널의 후회 경계에 어떻게 영향을 미치는가?
RQ5배치 크기가 적응적으로 선택될 때 알고리즘 독립적인 하한은 무엇인가?

주요 결과

일반화된 배치 크기로의 Batched Pure Exploration은 B = O(log log T) 배치에서 근사 최적의 O*(sqrt(T gamma_T)) 후회를 달성한다.
정제된 배치 크기 스케줄은 더 타이트한 배치 수를 산출하고 상수 정확한 B ≈ (log_{1/a} log T)(1+o(1))를 가능하게 한다.
a가 (1/2,1)일 때 후회 경계가 개선되고, Matérn의 경우 a가 (ν/(2ν+d), 1/2]일 때 gamma_bar_t가 잘 작동하면 경계가 유지된다.
적응형 배치는 하한에서 B에 대한 역다항 의존도를 가질 뿐이며, 이는 적응성이 minimax 후회를 상당히 개선하지 않음을 시사한다.
robust-BPE 알고리즘은 누적 후회가 비강건 설정과 일치하고 이전의 견고한 결과보다 간단한 후회를 개선한다.
적응형 배치의 하한은 근사 최적의 후회를 달성하려면 B가 최소 Ω(log_{1/η} log T)로 증가해야 함을 보이며 η는 커널 매개변수에 의존한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.