QUICK REVIEW

[論文レビュー] Partial order similarity based on mutual information

Gergely Tibély, Péter Pollner|arXiv (Cornell University)|Jan 22, 2016

Advanced Algebra and Logic被引用数 1

ひとこと要約

本稿では、調整付き相互情報量を用いた、部分順序のための新しい類似尺度を提案する。この尺度は、不一致の位置に応じて一致度を評価する。特に、上位に位置する不一致は、下位に位置する不一致よりも類似度をより大きく低下させる。この手法は、木構造の部分順序に対して O(|C|² ln |C|) の時間計算量を有し、完全一致の場合は類似度が 1、独立な順序の場合は 0 となる。

ABSTRACT

Comparing the ranking of candidates by different voters is an important topic in social and information science with a high relevance from the point of view of practical applications. In general, ties and pairs of incomparable candidates may occur, thus, the alternative rankings are described by partial orders. Various distance measures between partial orders have already been introduced, where zero distance is corresponding to a perfect match between a pair of partial orders, and larger values signal greater differences. Here we take a different approach and propose a similarity measure based on adjusted mutual information. In general, the similarity value of unity is corresponding to exactly matching partial orders, while a low similarity is associated to a pair of independent partial orders. The time complexity of the computation of this similarity measure is $\mathcal{O}(\left|{\mathcal C} ight|^3)$ in the worst case, and $\mathcal{O}(\left|{\mathcal C} ight|^2\ln \left|{\mathcal C} ight|)$ in the typical case of partial orders corresponding to trees with constant branching number, where $\left|{\mathcal C} ight|$ denotes the number of candidates. An interesting feature of our approach is that the similarity measure is sensitive to the position of the disagreements in the ranking: Differences at the highly ranked candidates induce larger similarity drop compared to disagreements at the bottom candidates.

研究の動機と目的

部分順序の類似尺度を開発し、不一致の影響がその順位に応じて反映されることを目的とする。
従来の距離尺度の限界を克服し、類似度スコアを導入する。1 は完全一致、0 は独立を示す。
階層的構造、特に分岐数が有界な構造に対して計算的に効率的な手法を構築すること。
投票システム、階層抽出、生物学的ネットワーク解析などの応用分野における順序比較をより正確に可能にすること。

提案手法

比較対象となる2つの部分順序における各候補の位置を示すインジケータ関数を用いて、2つの確率変数を定義する。
これらのインジケータ変数間の相互情報量を計算し、共通する順序構造を定量化する。
調整付き相互情報量（AMI）による正規化を適用することで、類似度が 0（独立）から 1（同一）の範囲に収まるようにする。
各候補について、ハッセ図表現を用いて支配集合 Dκ(i) と Dµ(j) を抽出する。
両方の部分順序における支配集合の交差サイズから、同時確率および周辺確率を計算する。
最悪ケースでは O(|C|³)、分岐数が定数の木構造の部分順序では O(|C|² ln |C|) の時間計算量を有する。

実験結果

リサーチクエスチョン

RQ1不一致の位置が順位の上位にあるほど、類似度に大きな影響を与えるような、部分順序間の類似度の測定方法は何か？
RQ2正規化され、真の類似度（0 から 1）として解釈可能な情報理論的類似尺度を構築できるか？
RQ3本手法の類似尺度は、ケンダールのtau距離と比較して、感度と精度の面で優れているか？
RQ4本手法の計算効率は、木構造のような一般的な階層的構造においてどうか？
RQ5距離ベースの手法よりも、部分順序におけるランダム化要素の割合（f値）をより正確に推定できるか？

主な発見

提案された類似尺度 S は、同一の部分順序では 1、独立な順序では 0 をとる。これは正規化され、解釈可能なスケールを提供する。
上位順位での不一致は、下位順位での不一致よりも類似度の低下を顕著に引き起こす。これは現実の好みの感度を反映している。
分岐数が定数の木構造の部分順序に対して、時間計算量は O(|C|² ln |C|) であり、階層的データに対して効率的である。
類似尺度 S は、ケンダールのtau距離（Kendall’s tau）と比較して、適合する f-値（ランダム化候補の割合）の範囲が狭く、ランダム化レベルの推定精度がより高い。
類似度分布（L(S)）の重なり積分は、f-差が増加するにつれて急速に減少するが、ケンダールのtau（L(KH)）の場合は、大きな f-ギャップに対しても高いまま保たれる。これは S がより識別性に優れていることを示している。
調整付き相互情報量の定式化により、直接的な相互情報量に見られる正規化の問題を回避し、一貫性があり意味のある類似度スコアを保証する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。