QUICK REVIEW

[논문 리뷰] A Modern Introduction to Online Learning

Francesco Orabona|arXiv (Cornell University)|2019. 12. 31.

Advanced Bandit Algorithms Research참고 문헌 92인용 수 68

한 줄 요약

온라인 볼록 최적화에 관한 현대적이고 포괄적인 텍스트로, 온라인 학습 알고리즘(FTL, OGD, OMD, FTRL)과 후회 분석, 적응 방법, 그리고 더 넓은 주제들과의 기초적 연결을 다룬다.

ABSTRACT

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the included proofs have been carefully chosen to be as simple and as short as possible.

연구 동기 및 목표

온라인 학습 프레임워크와 후회 최소화를 핵심 목표로 소개한다.
주요 온라인 알고리즘(예: Online Gradient Descent, Subgradient Descent, Mirror Descent, FTRL)과 그들의 후회 분석을 제시한다.
적응형/매개변수 없는 방법, 강볼록성, 밴딧 문제, 비정상적 환경 등 확장을 탐구한다.
볼록 분석, 확률적 최적화, 학습 이론 등 더 넓은 주제와의 연계를 다룬다.

제안 방법

임의의 비교자와 적대적 손실 시퀀스에 대해 후회를 정의한다.
핵심 알고리즘인 Online Subgradient Descent, projection이 있는 Online Gradient Descent, 그리고 Follow-the-Regularized-Leader 변형들을 개발하고 분석한다.
Online Mirror Descent를 도입하고 이것의 부분그래디언트 및 Bregman 발산과의 연결을 설명한다.
볼록성 및 경계 기울기에 대해 서브선형 후회를 확립하기 위한 증명 및 보조정리(Be-the-Leader)를 제공한다.

실험 결과

연구 질문

RQ1볼록함(미분가능하거나 비미분인 손실)을 가진 온라인 학습에서 보장할 수 있는 후회 상한은 무엇인가?
RQ2다양한 볼록성 및 정의역 조건에서 Online Gradient Descent, Subgradient Descent, Mirror Descent는 어떻게 비교되는가?
RQ3무한대이거나 복잡한 정의역에서 서브선형 후회를 달성하는 데 적응성 및 매개변수 없는 접근법은 어떤 역할을 하는가?
RQ4온라인 학습 개념은 밴딧, 안장점 문제, 순차적 투자와 같은 고급 설정으로 어떻게 확장될 수 있는가?

주요 결과

FTL은 적대적 환경에서 최적이 아닐 수 있어 projection이 있는 Online Gradient Descent를 필요로 한다.
투영된 Online Gradient Descent는 볼록 미분가능한 손실과 제한된 정의역에서 서브선형 후회 상한을 보인다.
Be-the-Leader 보조정리는 적응적 리더들의 누적 손실을 고정 비교자와 관련지어 서브선형 후회를 뒷받침한다.
Online Mirror Descent는 Bregman 발산을 통한 통합적 시각을 제공하며 비유클리드 기하를 다룰 수 있다.
적응형 및 매개변수 없는 변형은 강볼록성도 함께 성능을 향상시키고 적용 범위를 넓힌다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.