QUICK REVIEW

[論文レビュー] Noisy Sorting Without Resampling

Mark Braverman, Elchanan Mossel|ArXiv.org|Jul 6, 2007

Game Theory and Voting Systems参考文献 7被引用数 151

ひとこと要約

この論文は、再サンプリングが不可能な状況下で、ノイズのあるペアワイズ比較において、真の順序に近い順序の高確率回復を達成する多項式時間アルゴリズムを提示する。アルゴリズムの実行時間は $ n^{O(\bar{\gamma}^{-4})} $ であり、サンプリングの複雑さは $ O_{\gamma}(n\log n) $ である。また、最適な順序は真の順序から総位入れ替え距離で $ \Theta(n) $、最大位入れ替え距離で $ \Theta(\log n) $ の範囲内にあることが示されている。

ABSTRACT

In this paper we study noisy sorting without re-sampling. In this problem there is an unknown order $a_{π(1)} < ... < a_{π(n)}$ where $π$ is a permutation on $n$ elements. The input is the status of $n \choose 2$ queries of the form $q(a_i,x_j)$, where $q(a_i,a_j) = +$ with probability at least $1/2+\ga$ if $π(i) > π(j)$ for all pairs $i eq j$, where $\ga > 0$ is a constant and $q(a_i,a_j) = -q(a_j,a_i)$ for all $i$ and $j$. It is assumed that the errors are independent. Given the status of the queries the goal is to find the maximum likelihood order. In other words, the goal is find a permutation $σ$ that minimizes the number of pairs $σ(i) > σ(j)$ where $q(σ(i),σ(j)) = -$. The problem so defined is the feedback arc set problem on distributions of inputs, each of which is a tournament obtained as a noisy perturbations of a linear order. Note that when $\ga < 1/2$ and $n$ is large, it is impossible to recover the original order $π$. It is known that the weighted feedback are set problem on tournaments is NP-hard in general. Here we present an algorithm of running time $n^{O(γ^{-4})}$ and sampling complexity $O_γ(n \log n)$ that with high probability solves the noisy sorting without re-sampling problem. We also show that if $a_{σ(1)},a_{σ(2)},...,a_{σ(n)}$ is an optimal solution of the problem then it is ``close'' to the original order. More formally, with high probability it holds that $\sum_i |σ(i) - π(i)| = Θ(n)$ and $\max_i |σ(i) - π(i)| = Θ(\log n)$. Our results are of interest in applications to ranking, such as ranking in sports, or ranking of search items based on comparisons by experts.

研究の動機と目的

ペアワイズ比較がノイズを含み、再試行が不可能な状況における順序付け問題に対処すること。これは、現実の順序付けアプリケーションで一般的な状況である。
ノイズのある比較下で最大尤度順序を効率的に特定するアルゴリズムを設計し、逆転の数を最小限に抑えること。
計算された順序が真の潜在的順序からどれほど近いかという理論的保証を確立すること。
再サンプリングが不可能な状況下で、ノイズレベル $ \gamma $、サンプリングの複雑さ、近似精度の間のトレードオフを分析すること。

提案手法

アルゴリズムは、サイズ $ \Theta(\log n) $ の区間を用いた二分探索木構造を用い、現在のソート済み集合を重複する部分集合に分割することで、新しい要素の挿入位置を局所化する。
隣接する区間からの要素と $ k = O(\gamma^{-2}) $ 回の比較を用いた多数決テストにより、正しい挿入区間を高確率で特定する。
再帰的挿入プロセスを採用し、各ステップで正しい区間に進む確率が少なくとも 0.99 であるようにし、$ c_2 \log n $ ステップでサイズ 2 のリーフ区間に収束させる。
二項分布の尾部の集中不等式を活用することで、最終的な挿入位置が真の位置から $ O(\gamma^{-4} \log n) $ 以内に収束することを保証する。
重複する領域を持つ区間の木構造を維持することで、個々の比較がノイズを含んでも、比較に基づく局所化が頑健に機能する。
確率的解析と組合せ最適化を統合し、各段階での誤差確率を制御するためにチェルノフの不等式を用いる。

実験結果

リサーチクエスチョン

RQ1ペアワイズ比較がノイズを含み、再試行が不可能な状況で、真の順序に近い順序を回復できるか？
RQ2ノイズのある比較下で、真の順序の高確率回復を達成するために必要な最小比較回数は何か？
RQ3位入れ替え指標の観点から、最適解は真の順序からどの程度近いか？
RQ4再サンプリングが不可能なノイズのあるソート問題に対して、非自明な近似保証を達成する多項式時間アルゴリズムは存在するか？

主な発見

アルゴリズムは時間 $ n^{O(\gamma^{-4})} $ で実行され、$ O_{\gamma}(n\log n) $ の比較回数を要し、最適順序の高確率回復を達成する。
高確率で、最適順序と真の順序との間の総位入れ替え距離は $ \Theta(n) $ であり、平均位置誤差が定数であることを示している。
任意のアイテムについて、最適順序と真の順序との間の最大位入れ替え距離は $ \Theta(\log n) $ であり、これは対数的最悪ケース誤差を示している。
各挿入ステップで、正しい区間へ収束する確率が少なくとも 0.99 であり、$ c_2 \log n $ ステップ（$ c_2 = O(\beta + 1) $）で実現される。
アルゴリズムの正しさは集中不等式に依存しており、$ c_2 \log n $ ステップ後に正しい区間から逸脱する確率は最大で $ n^{-\beta-1} $ であることが示されている。
このアプローチはノイズに強く、$ \gamma $ が小さくても、各テストで $ k = O(\gamma^{-2}) $ 回の比較を繰り返し用いる多数決テストにより、高い正確性を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。