QUICK REVIEW

[论文解读] Optimal Odd Arm Identification with Fixed Confidence.

Gayathri R Prabhu, Srikrishna Bhashyam|arXiv (Cornell University)|Dec 11, 2017

Advanced Bandit Algorithms Research被引用 4

一句话总结

本文提出一种序列策略，用于在具有向量指数族分布的多臂赌博机中识别异常臂，在固定置信度约束下最小化总成本（时间成本加切换成本）。通过利用共轭先验和广义似然比统计量，该策略在确保错误检测概率受控的同时，实现了总成本的渐近最优。

ABSTRACT

The problem of detecting an odd arm from a set of K arms of a multi-armed bandit, with fixed confidence, is studied in a sequential decision-making scenario. Each arm's signal follows a distribution from a vector exponential family. All arms have the same parameters except the odd arm. The actual parameters of the odd and non-odd arms are unknown to the decision maker. Further, the decision maker incurs a cost whenever the decision maker switches from one arm to another. This is a sequential decision making problem where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. Of interest are policies that satisfy a given constraint on the probability of false detection. An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. Thanks to the vector exponential family assumption, the signal processing in this policy at each stage turns out to be very simple, in that the associated conjugate prior enables easy updates of the posterior distribution of the model parameters. The policy, with a suitable threshold, is shown to satisfy the given constraint on the probability of false detection. Further, the proposed policy is asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection

研究动机与目标

解决所有臂服从向量指数族分布（除一个异常臂外）的多臂赌博机设置中异常臂的序列检测问题。
在错误检测的固定置信度约束下，最小化总成本（即期望决策时间与累积切换成本之和）。
设计一种策略，确保错误检测概率低于预设阈值，同时实现总成本的渐近最优。
利用向量指数族的结构，通过共轭先验实现高效的贝叶斯更新。
建立总成本的信息论下界，并证明所提策略在渐近意义上达到该下界。

提出的方法

策略在每一步利用广义似然比统计量指导臂的选择，平衡探索与决策准确性。
采用共轭先验，实现各臂模型参数的高效且闭式表达的后验更新。
设定广义似然比的阈值以控制错误检测概率，确保满足固定置信度约束。
当异常臂假设的似然比超过阈值时，决策规则终止，表明识别已达到足够置信度。
策略根据后验方差和似然比增益动态选择观测臂，以最小化不必要的切换。
理论分析利用信息论工具推导总成本的下界，并证明该策略渐近达到该下界。

实验结果

研究问题

RQ1在固定置信度约束下，识别异常臂的总成本（时间成本加切换成本）的信息论下界是什么？
RQ2如何设计一种序列策略，使其在保持错误检测概率受控的同时达到该下界？
RQ3向量指数族结构在实现序列学习过程中高效且可处理的后验更新中起到什么作用？
RQ4共轭先验的使用如何简化检测策略的实现与分析？
RQ5在所有满足错误检测约束的策略中，该策略在什么条件下关于总成本渐近最优？

主要发现

在固定置信度下，推导出异常臂识别问题的总成本（期望决策时间加总切换成本）的信息论下界。
所提策略基于广义似然比统计量与共轭先验，满足给定的错误检测概率约束。
该策略在总成本上实现渐近最优，即其期望成本在置信度要求趋紧时收敛至信息论下界。
共轭先验的使用实现了高效且闭式的贝叶斯更新，使策略在序列观测与切换成本下仍具计算可处理性。
向量指数族假设确保充分统计量足以用于参数估计，简化了每阶段的信号处理。
该策略对异常臂与非异常臂的未知参数具有鲁棒性，仅依赖于指数族结构与先验共轭性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。