QUICK REVIEW

[论文解读] The on-line shortest path problem under partial monitoring

András György, Tamás Linder|ArXiv.org|Apr 8, 2007

Advanced Bandit Algorithms Research参考文献 28被引用 158

一句话总结

本文提出了一种高效的在线算法，用于在仅揭示所选路径总损失的局部监控环境下求解最短路径问题。该算法相对于最优固定路径实现了 O(1/√n) 的遗憾界，且遗憾界对图大小呈多项式依赖关系。该方法可扩展至标签高效设置与随时间变化的路径场景，在模拟与理论上均优于先前方法。

ABSTRACT

The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1/\sqrt{n} and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. An extension to the so-called label efficient setting is also given, in which the decision maker is informed about the weights of the edges corresponding to the chosen path at a total of m << n time instances. Another extension is shown where the decision maker competes against a time-varying path, a generalization of the problem of tracking the best expert. A version of the multi-armed bandit setting for shortest path is also discussed where the decision maker learns only the total weight of the chosen path but not the weights of the individual edges on the path. Applications to routing in packet switched networks along with simulation results are also presented.

研究动机与目标

解决仅在每次决策后揭示路径级损失的在线最短路径问题。
设计一种算法，在仅观测到路径总损失、无法获取单条边权重的情况下，实现对边数依赖最小的次线性遗憾。
将框架扩展至标签高效设置，其中反馈仅在 m < n 个时间点出现。
处理最优路径随时间变化的场景，且最优路径的变化为次线性。
提供具有线性时间复杂度的实用算法，并在对抗性环境中具备强理论保证。

提出的方法

该算法使用路径基来表示路径空间，通过在线凸优化实现高效计算与遗憾分析。
采用改进的指数加权策略，并结合精心设计的损失估计方案，以处理部分反馈。
遗憾分析依赖于鞅差分的 Bernstein 不等式，以界定累积损失与其期望值的偏差。
在标签高效设置中，算法通过仅在 m 个反馈时刻更新估计值实现自适应，保持 O(1/√n) 的遗憾，且对反馈频率的依赖为 O(√(ln N / m))。
在受限反馈模型中，仅揭示总路径损失，算法采用路径-带兵方法，遗憾为 O(n^{-1/3})，较先前方法更为简洁。
该算法在 n 和边数上均具有线性时间复杂度，使其在大规模图上具备可扩展性。

实验结果

研究问题

RQ1当仅揭示所选路径的总损失、未观测到单条边权重时，能否实现 O(1/√n) 遗憾的在线最短路径算法？
RQ2在反馈仅限于 m < n 个时间点的标签高效设置中，能否保持 O(1/√n) 的遗憾？
RQ3当最优路径随时间次线性变化时，该算法能否有效与时间变化的最优路径竞争？
RQ4与现有方法相比，该算法在遗憾率与计算复杂度方面表现如何？
RQ5能否使算法对参数调优具有鲁棒性，避免依赖离线优化？

主要发现

该算法相对于最优固定路径实现了 O(1/√n) 的遗憾，且遗憾随边数的增加呈多项式增长，而非指数增长。
在标签高效设置中，遗憾随 O(√(ln N / m)) 变化，与已知理论边界一致，支持高效利用反馈。
模拟结果表明，该算法在无需离线参数调优的情况下，仍优于 Awerbuch 和 Kleinberg 的方法，表现出鲁棒性。
在受限反馈模型（仅揭示总路径损失）中，该算法实现了 O(n^{-1/3}) 的遗憾，与最佳已知先验结果一致，但设计更为简洁。
模拟结果证实，该算法的归一化遗憾以预测速率收敛至零，且始终优于固定路径基线。
该算法在轮次与边数上均保持线性时间复杂度，使其在动态网络路由等大规模应用场景中具备实用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。