QUICK REVIEW

[논문 리뷰] Fast determinantal point processes via distortion-free intermediate sampling

Michał Dereziński|arXiv (Cornell University)|2018. 11. 08.

Random Matrices and Applications인용 수 23

한 줄 요약

이 논문은 입력 스파arsity 전처리 시간과 다항수준(d)의 샘플링 시간을 갖는 정확한 결정성 점과정(DPPs) 샘플링을 위한 새로운 알고리즘을 제안한다. 이는 n에 대해 독립적이다. 비왜곡 정규화 DPP(R-DPP)를 중간 분포로 사용함으로써, 이는 푸아송 기반 크기 제어 메커니즘에 의해 가능해지며, 전처리 및 샘플링 비용이 n에 대해 비선형적으로 증가하는 것을 방지하여 이전 방법에 비해 크게 향상된 첫 번째 정확한 DPP 샘플링을 달성한다.

ABSTRACT

Given a fixed $n\ imes d$ matrix $\\mathbf{X}$, where $n\\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelepiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O(\ ext{number-of-non-zeros}(\\mathbf{X})\\cdot\\log n)+\ ext{poly}(d)$, and (2) a sampling step which runs in $\ ext{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new regularized determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\ ext{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction.

연구 동기 및 목표

대규모 n에 대해 DPP의 전처리 및 샘플링 비용이 높은 문제를 해결한다.
이전 DPP 알고리즘의 한계를 극복한다. 이는 전처리에 Ω(nd²) 또는 샘플링에 Ω(n|S|)의 시간 복잡도가 필요하다는 점이다.
샘플링 시간이 n에 독립적인 정확한 DPP 샘플링을 가능하게 하는 방법을 개발한다. 동시에 낮은 전처리 비용을 유지한다.
목표 DPP 확률을 왜곡 없이 유지하는 정규화 DPP(R-DPP)를 중간 분포로 도입한다.
데이터 요약, 낮은 랭크 행렬 복원, 확률적 최적화와 같은 분야에서 효율적인 DPP 샘플링을 가능하게 한다.

제안 방법

목표 DPP 확률을 왜곡 없이 유지하면서 행 수를 n에서 poly(d)로 줄일 수 있는 정규화 DPP(R-DPP)를 중간 분포로 제안한다.
R-DPP에서 부분집합 크기를 제어하기 위해 푸아송 랜덤 변수를 사용함으로써, 닫힌 형태의 정규화 상수를 포함한 새로운 결정성 공식을 도출한다.
두 단계 샘플링 절차를 설계한다: 첫 번째로 R-DPP에서 다항수준(d) 크기의 집합을 샘플링하고, 두 번째로 목표 DPP 분포를 사용하여 최종 집합으로 다운샘플링한다.
원래 DPP의 모든 부분집합의 상대적 확률을 유지함으로써 중간 샘플링 단계가 왜곡 없이 이루어지도록 보장한다.
희소 행렬 연산과 낮은 랭크 구조를 활용하여 입력 스파arsity 전처리 시간을 확보하며, 비용은 O(nnz(X) log n + poly(d))이다.
중간 R-DPP의 저차원 구조를 활용하고 효율적인 행렬식 계산을 통해 다항수준(d)의 샘플링 시간을 보장한다.

실험 결과

연구 질문

RQ1n에 대해 비선형적으로 증가하는 전처리 시간을 갖는 DPP 샘플링 알고리즘을 설계할 수 있는가? 특히 O(nnz(X) log n + poly(d))의 전처리 시간을 달성할 수 있는가?
RQ2n에 독립적인 시간 복잡도를 갖는 정확한 DPP 샘플링을 달성할 수 있는가? 즉, 다항수준(d)의 샘플링 시간을 확보할 수 있는가?
RQ3행 수를 poly(d)로 줄이면서도 목표 DPP 확률을 유지하는 중간 분포를 구성할 수 있는가?
RQ4푸아송 기반 크기 제어를 갖는 정규화 DPP의 정규화 상수를 유도할 수 있는가?
RQ5왜곡 없는 중간 샘플링이 DPP 알고리즘에 이론적이고 실용적인 영향을 미치는가?

주요 결과

제안된 알고리즘은 입력 스파arsity 전처리 시간을 달성한다: O(nnz(X) log n + poly(d)), 이는 정확한 DPP 샘플링에서 처음으로 이뤄진 결과이다.
샘플링 시간이 다항수준(d)으로 줄어들었으며, n에 독립적이므로 이는 이와 같은 성질을 갖는 첫 번째 정확한 DPP 알고리즘이다.
중간 R-DPP 분포는 왜곡 없이, 즉 원래 DPP의 모든 부분집합의 정확한 확률을 유지한다.
부분집합 크기 제어를 위해 푸아송 랜덤 변수를 사용함으로써 새로운 분석 공식을 도출할 수 있었으며, R-DPP의 정규화 상수에 대해 닫힌 형태의 표현식을 제공한다.
이 방법은 n ≫ d인 대규모 응용 분야, 예를 들어 데이터 요약 및 낮은 랭크 행렬 복원에서 효율적인 DPP 샘플링을 가능하게 한다.
이전 최고 성능의 방법들에 비해 향상되었으며, 이는 전처리에 Ω(nd²) 또는 샘플링에 Ω(n|S|)의 시간 복잡도가 필요하다는 점에서 비롯된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.