QUICK REVIEW

[论文解读] Scalable Generalized Linear Bandits: Online Computation and Hashing

Kwang-Sung Jun, Aniruddha Bhargava|arXiv (Cornell University)|Jun 1, 2017

Advanced Bandit Algorithms Research参考文献 25被引用 31

一句话总结

本文通过引入广义线性在线-置信集转换（GLOC）框架，提出适用于广义线性Bandits（GLBs）的可扩展算法，利用在线学习实现每轮恒定的空间与时间复杂度。进一步提出哈希兼容算法，在臂的数量上实现次线性时间复杂度，将遗憾界降低至$O(d^{5/4})$——优于现有$O(d^{3/2})$的界限——同时通过优化哈希实现快速近似内积计算。

ABSTRACT

Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number $N$ of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i.e., "hash-amenable") and result in a time complexity sublinear in $N$. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for $d$-dimensional arm sets scales with $d^{3/2}$, whereas GLOC's regret bound scales with $d$. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with $d^{5/4}$. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.

研究动机与目标

解决广义线性Bandits（GLBs）在时间范围和臂的数量上的可扩展性问题，该问题限制了其实际部署。
克服现有GLB算法随轮次$ t $增长而产生线性空间与时间复杂度的问题。
设计一种方法，利用哈希技术在臂的数量$ N $上实现次线性时间复杂度。
在保持哈希兼容性的前提下，将哈希兼容GLB算法的遗憾界从$O(d^{3/2})$降低至$O(d^{5/4})$。
设计一种快速且精确的近似内积计算方法，用于哈希基GLB算法。

提出的方法

提出广义线性在线-置信集转换（GLOC）框架，将任意在线学习算法转化为低遗憾的GLB算法。
将GLOC应用于在线牛顿步算法，实现每轮恒定的空间与时间复杂度，与$ t $无关。
基于GLOC的Thompson采样扩展，设计一种哈希兼容的GLB算法，通过局部敏感哈希实现在$ N $上的次线性时间复杂度。
提出一种新型哈希键计算方法，在内积估计中相比最先进方法具有更高精度，采用优化的投影向量。
利用多探针哈希技术，在高维空间中高效搜索候选臂，避免完整枚举。
利用投影向量的正态分布假设，理论证明在高维下基于L1的哈希方差低于L2，支持其在所提框架中的应用。

实验结果

研究问题

RQ1我们能否设计一种GLB算法，实现每轮恒定的空间与时间复杂度，且与时间范围$ t $无关？
RQ2我们能否在保持低遗憾的前提下，利用哈希技术在臂的数量$ N $上实现次线性时间复杂度？
RQ3我们能否将哈希兼容GLB算法的遗憾界从$O(d^{3/2})$降低至$O(d^{5/4})$？
RQ4我们能否设计一种更快且更精确的近似内积计算方法，用于基于哈希的GLB算法？
RQ5在高维GLB设置下，哈希方案的选择（L1与L2）如何影响方差与性能？

主要发现

所提出的GLOC框架实现了每轮恒定的空间与时间复杂度，消除了先前GLB方法中随$ t $线性增长的问题。
基于GLOC的在线牛顿步算法实现了$O(d)$的遗憾界，优于先前方法的$O(d^{3/2})$遗憾界。
新提出的哈希兼容算法实现了$O(d^{5/4})$的遗憾界，弥合了先前哈希兼容GLB方法中$O(d^{3/2})$遗憾界的差距。
所提出的近似内积计算方法在精度上优于最先进技术，尤其在高维设置下表现更优。
实验结果证实了所提方法的实际优势，包括在大规模Bandit设置中推理速度更快、遗憾更低。
理论分析表明，在高维正态分布设置下，基于L1的哈希方差低于L2，支持其在所提框架中的应用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。