QUICK REVIEW

[論文レビュー] Do Less, Get More: Streaming Submodular Maximization with Subsampling

Moran Feldman, Amin Karbasi|arXiv (Cornell University)|Jan 1, 2018

Complexity and Algorithms in Graphs被引用数 28

ひとこと要約

本論文は、関数評価回数とメモリ使用量を著しく削減する一方で、タイトな近似保証を達成する、新しい1パスストリーミングアルゴリズムを導入する。単調な部分集合関数に対して $p$-マッチョイド制約の下で、$O(k)$ のメモリと要素あたり $O(km/p)$ のクエリで $4p$ 近似を達成し、動画要約においては最先端の手法を最大50倍速く、大規模データセットでも効率的にスケーリングする。

ABSTRACT

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a $p$-matchoid constraint, our randomized algorithm achieves a $4p$ approximation ratio (in expectation) with $O(k)$ memory and $O(km/p)$ queries per element ($k$ is the size of the largest feasible solution and $m$ is the number of matroids used to define the constraint). For the non-monotone case, our approximation ratio increases only slightly to $4p+2-o(1)$. To the best or our knowledge, our algorithm is the first that combines the benefits of streaming and subsampling in a novel way in order to truly scale submodular maximization to massive machine learning problems. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty-fold while maintaining practically the same utility. We also evaluated the scalability of our algorithm on a large dataset of Uber pick up locations.

研究の動機と目的

マス・マシンラーニング応用における部分集合関数最大化のスケーラビリティのボトルネックを解決すること。
近似品質を損なわせることなく、ストリーミング部分集合関数最適化における関数評価回数とメモリ使用量を削減すること。
ストリームを1回のパスで処理し、ストリームの評価を最小限に抑える実用的な1パスアルゴリズムを開発すること。
単調および非単調な部分集合関数に対して、$p$-マッチョイド制約の下でタイトな近似比を達成すること。

提案手法

アルゴリズムは、ストリーム要素の一部のみを評価する新しいサブサンプリング戦略を採用し、計算負荷を低減する。
ランダム選択プロセスを用いて、最適解に寄与する可能性の高い候補要素のコアセットを維持する。
妥当性を保ちつつ近似保証を維持するために、$p$-マッチョイド制約モデルを統合する。
現在の解の状態に応じてサンプリングレートを動的に調整し、精度と効率のバランスを取る。
アルゴリズムは1パスで動作し、$k$ が最大の妥当な解のサイズであるとすると、メモリに $O(k)$ 要素のみを格納する。
単調関数に対しては期待値で $4p$ の近似比を達成し、非単調関数に対しては $4p+2-o(1)$ を達成する。

実験結果

リサーチクエスチョン

RQ1ストリーム全体を評価しなくてもよい1パスストリーミングアルゴリズムを部分集合関数最大化のために設計できるか？
RQ2ストリーミング環境下で、最小限のメモリと関数評価回数で達成可能な最良の近似比は何か？
RQ3サブサンプリングをどのように活用すれば、部分集合関数最適化における計算コストを削減しながら解の品質を維持できるか？
RQ4アルゴリズムは、Uberのピックアップ場所のような大規模データセットにもスケーリング可能か？

主な発見

単調部分集合関数に対して、$p$-マッチョイド制約の下で、期待値で $4p$ の近似比を達成する。
非単調関数では、近似比が $4p+2-o(1)$ であり、単調の場合と比較してわずかに劣化するにとどまる。
アルゴリズムは $O(k)$ のメモリを使用し、要素あたり $O(km/p)$ の関数クエリを必要とし、計算コストを著しく削減する。
動画要約において、アルゴリズムはランタイムで最先端の手法を最大50倍速く、同等のユーティリティを維持する。
アルゴリズムはUberのピックアップ場所の大規模データセットでも効果的にスケーリングされ、実世界のマシンラーニングワークロードへの実用的応用を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。