QUICK REVIEW

[論文レビュー] Efficient and trustworthy methods for knowledge discovery

Edoardo Galimberti, Martino Ciaperoni|arXiv (Cornell University)|Oct 6, 2019

Advanced Graph Neural Networks参考文献 77被引用数 18

ひとこと要約

本稿では、時間的ネットワークにおけるスパンコア分解を導入し、関連する時間的スパンを持つ密接に接続された頂点群（スパンコア）を特定する手法を提示する。包含性の性質と最大コア検出を活用することで、著者らはスパンコアを計算するための効率的なアルゴリズムを開発し、動的計画法を用いた時間的コミュニティ検索に応用し、最大スパンコアを用いることで多項式時間の解法を達成し、顕著な高速化を実現した。この手法は実世界の面と面の接触ネットワークを用いて検証され、社会的ダイナミクスの分析やグラフ埋め込み性能の向上においてスケーラビリティと実用的有用性を示した。

ABSTRACT

Data are building blocks to information and, subsequently, they are vital input to knowledge. Today, in the midst of the digital era, vast quantities of highly-complex data are being collected and processed at an unprecedented scale. This abundance of data has highlighted the importance of efficient and effective knowledge-discovery algorithms to identify patterns hidden in the data with the ultimate aim of uncovering valuable knowledge and shape our understanding of the world around us. To capitalize on the opportunities offered by massive amounts of data as well as modern computing power, for many years, research in knowledge discovery and related areas has introduced algorithms that are increasingly efficient and effective, but also more and more opaque and unpredictable. Recently, growing interest in the ethical dimensions of algorithms has drawn attention to the limitations of opaque algorithms and has emphasized a need for trustworthy algorithms particularly when such algorithms are used to support high-stakes decision making. In order to be trustworthy, algorithms should solve a clearly defined problem via a clear sequence of instructions, they should not be utterly unsuccessful in any particular case and they should be easy to understand and interpret for humans so that no harmful biases can be hidden. In this thesis, we pursue the goal of developing novel knowledge-discovery algorithmic methods that are not only highly efficient to face the challenges and opportunities posed by modern data, but also trustworthy. In particular, we propose efficient and trustworthy methods for a collection of popular knowledgediscovery tasks. First, we consider tasks of exact inference in Bayesian networks and hidden Markov models. Trustworthy approaches for such tasks exist. However, their applicability may be severely limited by time or memory requirements. Therefore, we propose novel methods to reduce the time or memory resources that are needed by existing approaches for the considered exact inference tasks. Beside exact inference tasks, we also consider two different knowledge-discovery tasks that arise naturally in modern data: multi-label classification and community search in temporal graphs. Regarding multi-label classification, we propose an efficient and accurate rule-based multi-label classifier that drastically improves upon the interpretability of existing solutions. For community search in temporal graphs, we formalise the task for the first time, and we propose a solution that guarantees high efficiency and interpretability. In designing knowledge-discovery methods, we often rely on existing database-management and probabilistic methods. Methods for database management are valuable to address the large dimension and high complexity of modern data, while probabilistic methods are essential to methodologically handle uncertainty in the data.

研究の動機と目的

時間的ネットワークにおける密接で時間的に整合性のある部分グラフを同定する課題に取り組むこと。これは、社会的ダイナミクスの分析や異常検出に不可欠である。
各コアがコアネス（密度）と時間的スパン（存在期間）で定義される、時間的コア分解の新規な概念「スパンコア」を形式化すること。
理論的包含性の性質を活用して、すべてのスパンコアを計算する効率的なアルゴリズムを設計し、より効率的に最大スパンコア（コアネスおよびスパンの両方で支配されないもの）のみを計算すること。
時間的コミュニティ検索問題を多項式時間の動的計画法として定式化・解法し、性能向上のための最大スパンコアの活用を図ること。
スパンコアの実用的関連性を、実世界の応用（異常検出、データ品質評価、グラフ埋め込み分類の向上）において実証すること。

提案手法

コア分解の時間的拡張としてスパンコア分解を提案し、各コアを連続する時間間隔Δにおいて最小次数≥kを満たす頂点集合として定義する。
コア間の包含階層を利用することで、時間間隔の潜在的二次関数的増加を削減する手法を用いて、すべてのスパンコアを効率的に計算するアルゴリズムを開発する。
完全な列挙を回避するため、直接最大性の条件をチェックすることで、最大スパンコアのみを抽出する専用のアルゴリズムを設計する。
時間的コミュニティ検索と最大スパンコアの間の理論的関係を確立し、完全な時間的カバレッジを保証する動的計画法の定式化を可能にする。
最大スパンコアを構築ブロックとして用いることで、ナイーブなDPに比べて計算時間を顕著に短縮する、時間的コミュニティ検索の高速化技術を導入する。
ハイパーパramータチューニングのため、node2vecとDeepWalkを用い、グリッドサーチを実施。スケーリングされた埋め込みに対して罰則付きロジスティック回帰を適用し、分類性能を評価する。

実験結果

リサーチクエスチョン

RQ1時間的ネットワークにおいて、最小限の計算コストで密接で時間的に整合性のある部分グラフ（スパンコア）をどのように効率的に発見できるか？
RQ2スパンコアの理論的構造は何か？また、コアネスまたはスパンの両方で支配されない最大スパンコアを、すべての可能なコアを列挙せずにどのように計算できるか？
RQ3時間的コミュニティ検索問題（全時間領域をカバーするコミュニティの同定）は効率的に解けるか？スパンコアはその性能をどのように向上できるか？
RQ4スパンコアは、特に頂点の役割分類や異常検出において、実世界の時間的ネットワークにおけるグラフ埋め込みの質をどの程度向上させるか？
RQ5最大スパンコアは、動的接触ネットワークにおける異常検出、データ検証、ネットワーク可視化といった実用的応用において、どのように貢献するか？

主な発見

すべてのスパンコアを計算するための提案手法は、包含性の性質を活用することで、時間間隔の二次的増加を回避し、効率性を達成している。
すべてのコアを計算するのではなく最大スパンコアのみを抽出するアルゴリズムは、直接的な最大性チェックにより冗長計算を回避するため、全コア計算よりも顕著に高速である。
動的計画法を用いることで、時間的コミュニティ検索は多項式時間で解ける。最大スパンコアの統合により、ベースライン手法に比べて実行時間が顕著に短縮された。
PrimarySchoolデータセットでは、埋め込み次元h ≥ 200のとき、TCS埋め込みはマクロF1スコアが1に近く、高次元hでのベースラインを上回り、h = |T|ではそれらと同等の性能を示した。
HighSchoolデータセットでは、h ≥ 200のとき、TCSの性能が最良手法と競合するようになり、時間的解像度が向上するに従い、スケーラビリティと有効性を示した。
スパンコアの活用により、グラフ埋め込み分類の性能が向上し、接触ネットワークにおける異常検出を支援するとともに、大規模な時間変動グラフの可視化の新しい方法を可能にした。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。