QUICK REVIEW

[논문 리뷰] Optimal Model Selection in Contextual Bandits with Many Classes via Offline Oracles

Sanath Kumar Krishnamurthy, Susan Athey|arXiv (Cornell University)|2021. 01. 01.

Advanced Bandit Algorithms Research참고 문헌 37인용 수 1

한 줄 요약

이 논문은 확률적 연속적 밴디트에서 모델 선택 문제를 오프라인 모델 선택 오라클로의 새로운 감소를 제안하며, 회귀 모델 선택과 유사한 계산 비용을 가지는 유연하고 효율적인 알고리즘을 가능하게 한다. 실현 가능한 클래스가 존재할 경우, 로그 인자 수준 이내의 최적 실현 가능성 기반의 리그레트 경계를 달성하며, 알려지지 않은 최적 클래스의 복잡도에 적응한다.

ABSTRACT

We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this paper, we propose the first reduction of model selection in contextual bandits to offline model selection oracles, allowing for flexible general purpose algorithms with computational requirements no worse than those for model selection for regression. Our main result is a new model selection guarantee for stochastic contextual bandits. When one of the classes in our set is realizable, up to a logarithmic dependency on the number of classes, our algorithm attains optimal realizability-based regret bounds for that class under one of two conditions: if the time-horizon is large enough, or if an assumption that helps with detecting misspecification holds. Hence our algorithm adapts to the complexity of this unknown class. Even when this realizable class is known, we prove improved regret guarantees in early rounds by relying on simpler model classes for those rounds and hence further establish the importance of model selection in contextual bandits.

연구 동기 및 목표

편향-분산 및 탐색-이득 간의 트레이드오프를 동시에 고려해야 하는 연속적 밴디트에서의 모델 선택 문제를 해결하기 위해.
연속적 밴디트에서의 모델 선택 복잡도를 오프라인 모델 선택 오라클로 감소시켜 일반 목적의 알고리즘을 가능하게 하기 위해.
실현 가능한 클래스가 존재할 경우, 최적의 모델 클래스에 대해 최적의 리그레트 경계를 달성하기 위해.
완전한 적응이 이루어지기 전에 간단한 모델 클래스를 활용하여 초기 라운드 성능을 향상시키기 위해.

제안 방법

이 방법은 연속적 밴디트에서의 온라인 모델 선택 문제를 오프라인 모델 선택 오라클로 감소시키며, 기존의 회귀 스타일 모델 선택 기법을 활용한다.
성능 피드백에 기반해 여러 모델 클래스 간에 동적으로 선택하는 새로운 알고리즘 프레임워크를 도입한다.
오프라인 모델 선택의 복잡도를 그대로 계승함으로써 계산 효율성을 확보하고, 추가 오버헤드를 피한다.
약한 가정 하에 적응을 지원하기 위해 모델 잘못 지정 탐지 기반 메커니즘을 통합한다.
신뢰도 기반 선택 전략을 사용하여 탐색과 이득 간의 균형을 유지하면서도 리그레트 최적성을 유지한다.

실험 결과

연구 질문

RQ1실현 가능성 기반의 리그레트 최적성에 손상이 가지 않도록, 연속적 밴디트에서의 모델 선택을 오프라인 모델 선택 오라클로 감소시킬 수 있는가?
RQ2어떤 조건에서 알고리즘이 후보 모델 클래스 집합 중 최적 클래스의 복잡도에 적응할 수 있는가?
RQ3완전한 적응이 이루어지기 전에 더 단순한 모델 클래스를 사용함으로써 초기 라운드 성능을 어떻게 향상시킬 수 있는가?
RQ4실현 가능한 클래스가 존재할 경우, 잘못 지정 탐지 기능의 영향은 리그레트 보장에 어떤가?

주요 결과

실현 가능한 모델 클래스가 존재할 경우, 클래스 수의 로그 인자 수준 이내에서 최적의 실현 가능성 기반 리그레트 경계를 달성한다.
최적성이 두 조건에서 유지된다: 또는 시간 수평이 충분히 크며, 또는 잘못 지정 탐지 가정이 성립한다.
알려지지 않은 최적 클래스의 복잡도에 적응하여, 최적 클래스에 대한 사전 지식 없이도 성능 향상을 이룬다.
실현 가능한 클래스가 사전에 알려져 있더라도, 초기 라운드에 단순한 모델을 먼저 사용함으로써 더 낮은 리그레트를 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.