QUICK REVIEW

[論文レビュー] q-means: A quantum algorithm for unsupervised machine learning

Iordanis Kerenidis, Jonas Landman|arXiv (Cornell University)|Dec 10, 2018

Quantum Computing Algorithms and Architecture参考文献 21被引用数 116

ひとこと要約

q-means は delta-k-means を模した量子クラスタリングアルゴリズムで、高い確率でセントロイドを出力し、N に対する依存性をサブリニアに達成し、ランタイムは N に対して polylogarithmic、d に対して linear、k に対して polynomial になる。

ABSTRACT

Quantum machine learning is one of the most promising applications of a full-scale quantum computer. Over the past few years, many quantum machine learning algorithms have been proposed that can potentially offer considerable speedups over the corresponding classical algorithms. In this paper, we introduce q-means, a new quantum algorithm for clustering which is a canonical problem in unsupervised machine learning. The $q$-means algorithm has convergence and precision guarantees similar to $k$-means, and it outputs with high probability a good approximation of the $k$ cluster centroids like the classical algorithm. Given a dataset of $N$ $d$-dimensional vectors $v_i$ (seen as a matrix $V \in \mathbb{R}^{N \times d})$ stored in QRAM, the running time of q-means is $\widetilde{O}\left( k d \fracη{δ^2}κ(V)(μ(V) + k \fracηδ) + k^2 \frac{η^{1.5}}{δ^2} κ(V)μ(V) \right)$ per iteration, where $κ(V)$ is the condition number, $μ(V)$ is a parameter that appears in quantum linear algebra procedures and $η= \max_{i} ||v_{i}||^{2}$. For a natural notion of well-clusterable datasets, the running time becomes $\widetilde{O}\left( k^2 d \frac{η^{2.5}}{δ^3} + k^{2.5} \frac{η^2}{δ^3} \right)$ per iteration, which is linear in the number of features $d$, and polynomial in the rank $k$, the maximum square norm $η$ and the error parameter $δ$. Both running times are only polylogarithmic in the number of datapoints $N$. Our algorithm provides substantial savings compared to the classical $k$-means algorithm that runs in time $O(kdN)$ per iteration, particularly for the case of large datasets.

研究の動機と目的

クラスタリングを標準的な非監督学習問題として動機づけ、大規模データセットに対するスケーラビリティに対処する。
収束性と近似保証を保つ k-means の量子類似物（delta-k-means）を開発する。
データ点の数 N の polylogarithmic な依存性と特徴次元 d の線形依存性を示すランタイム解析を提供する。
アルゴリズムが後続の古典的または量子タスクで使用可能な古典的セントロイドを出力することを保証する。

提案手法

q-means を QRAM に保存された R^d の N 個のベクターのクラスタリングに対する delta-k-means の量子対として定義する。
距離推定、最小値選択、行列計算、トモグラフィーの量子サブルーチンを用いてセントロイドを更新する。
k、d、η（最大行ノルム）、δ（ロバストネスパラメータ）、κ(V)（条件数）、μ(V) を量子線形代数手法から得られる値に依存する反復ごとのランタイム境界を提供する。
信頼性の高い距離推定を得るために振幅推定とメディアンに基づく増幅を活用する。
アルゴリズム実行中に作成される量子状態から古典的セントロイドベクトルを回復するためにベクトル状態トモグラフィーを用いる。
q-means が高い確率で delta-k-means と一致するセントロイドを出力することを示す。

実験結果

リサーチクエスチョン

RQ1q-means は古典的 delta-k-means の挙動と保証を量子設定で再現できるか。
RQ2データセットのパラメータ（N、d、k、η、δ、κ(V)、μ(V)）に関して、各反復および全体のランタイム要件はどのようになるか。
RQ3well-clusterable なデータモデルは theoretical guarantees とランタイムにどのような影響を与えるか。
RQ4生成されたセントロイドは後続タスクの古典的オブジェクトとして使用可能か、古典的な k-means との精度比較はどうか。

主な発見

一般データに対する反復ごとのランタイム: ~O~(kd η/(δ^2) κ(V)(μ(V)+k η/δ) + k^2 η^1.5/δ^2 κ(V) μ(V)).
well-clusterable データに対する反復ごとのランタイム: ~O~(k^2 d η^2.5/δ^3 + k^2.5 η^2/δ^3).
ランタイムは N に対して polylogarithmic、d に対して線形で、k、η、1/δ に対して多項式の依存関係を示す。
アルゴリズムは delta-k-means の解に対応する古典的セントロイドを高い確率で出力する。
QRAM データ構造と量子線形代数サブルーチンを用いて、反復ごとの古典的 kdN 境界に対する古典的なスピードアップを達成。
シミュレーションは、q-means が大規模データセットで k-means と同等の精度、またはより良いランタイムを達成できることを示唆。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。