QUICK REVIEW

[論文レビュー] The on-line shortest path problem under partial monitoring

András György, Tamás Linder|ArXiv.org|Apr 8, 2007

Advanced Bandit Algorithms Research参考文献 28被引用数 158

ひとこと要約

本稿では、選択された経路の総損失しか観測されない部分監視下での最短経路問題に対する効率的なオンラインアルゴリズムを提案する。最良の固定経路に対するO(1/√n)のリグレットバウンドを達成し、グラフサイズに対して多項式的依存性を示す。また、ラベル効率的および時間変動型経路設定へも拡張可能であり、理論的およびシミュレーション上、先行手法を上回る性能を示す。

ABSTRACT

The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1/\sqrt{n} and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. An extension to the so-called label efficient setting is also given, in which the decision maker is informed about the weights of the edges corresponding to the chosen path at a total of m << n time instances. Another extension is shown where the decision maker competes against a time-varying path, a generalization of the problem of tracking the best expert. A version of the multi-armed bandit setting for shortest path is also discussed where the decision maker learns only the total weight of the chosen path but not the weights of the individual edges on the path. Applications to routing in packet switched networks along with simulation results are also presented.

研究の動機と目的

各決定後に経路レベルの損失しか得られない、限られたフィードバック下でのオンライン最短経路問題に対処すること。
個々のエッジ重みが観測不能であっても、エッジ数にほとんど依存しない最小の依存性でサブ線形リグレットを達成するアルゴリズムを開発すること。
フィードバックがm < nの時間インスタンスに限定されるラベル効率的設定への拡張を図ること。
最適経路が時間とともに変化する状況を扱い、最適経路の変化がサブ線形的である場合に適応すること。
敵対的環境下で強い理論的保証を伴い、線形時間計算量を持つ実用的アルゴリズムを提供すること。

提案手法

経路空間を表現するための経路の基底を用い、オンライン凸最適化を用いた効率的計算とリグレット解析を可能にする。
部分フィードバックに対処するため、損失推定スキームを巧みに設計した修正版指数的重み戦略を適用する。
リグレット解析には、マルティンゲール差分に対するベルンシュタインの不等式を用い、累積損失の期待値からの逸脱を制限する。
ラベル効率的設定では、フィードバックがm回の時刻でのみ更新され、O(1/√n)のリグレットを維持しつつ、フィードバック頻度にO(√(ln N / m))の依存性を示す。
制限付きフィードバックモデルでは、総経路損失のみが公開され、O(n^{-1/3})のリグレットを達成するパス・バンディットアプローチを用いる。これは先行手法よりも単純である。
アルゴリズムはnおよびエッジ数に対して線形時間計算量であり、大規模グラフに対してもスケーラブルである。

実験結果

リサーチクエスチョン

RQ1選択された経路の総損失しか観測されない状況で、個々のエッジ重みが観測不能であっても、O(1/√n)のリグレットを達成できるオンライン最短経路アルゴリズムは存在するか？
RQ2フィードバックがm < nの時間インスタンスに限定されるラベル効率的設定でも、O(1/√n)のリグレットを維持することは可能か？
RQ3最適経路が時間とともに変化する状況で、変化がサブ線形的である場合に、アルゴリズムが時間変動型最適経路と効果的に競合できるか？
RQ4既存手法と比較して、提案手法のリグレットレートと計算量の複雑さはどのように異なるか？
RQ5パラメータチューニングに頼らずに、オフライン最適化を回避できるようにアルゴリズムを強化できるか？

主な発見

最良の固定経路に対するリグレットはO(1/√n)であり、リグレットはエッジ数に対して多項式的増加を示し、指数的増加ではない。
ラベル効率的設定では、リグレットはO(√(ln N / m))に比例し、既知の理論的バウンドと一致し、フィードバックの効率的利用を可能にする。
シミュレーションでは、オフラインパラメータチューニングなしでも、AwerbuchとKleinbergの手法を上回り、そのロバスト性を示している。
制限付きフィードバックモデル（総経路損失のみ公開）では、O(n^{-1/3})のリグレットを達成し、既存で最高の結果と一致するが、設計がより単純である。
シミュレーション結果から、正規化リグレットが理論的に予測されたレートでゼロに収束することが確認され、固定経路ベースラインを一貫して上回っている。
アルゴリズムはラウンド数およびエッジ数に対して線形時間計算量を維持しており、動的ネットワークルーティングなどの大規模応用において実用的である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。