QUICK REVIEW

[論文レビュー] Learning the Structure of Generative Models without Labeled Data

Stephen H. Bach, Bryan He|arXiv (Cornell University)|Mar 2, 2017

Music and Audio Processing参考文献 35被引用数 70

ひとこと要約

tldr: 自動的に学習可能な l1-regularized marginal pseudolikelihood アプローチを提案し、ラベル付きデータなしで依存関係の生成モデルの学習を行い、sublinear サンプル複雑性と full-structure 法に対する大規模な speedups を実現します。

ABSTRACT

Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the $\ell_1$-regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100$\times$ faster than a maximum likelihood approach and selects $1/4$ as many extraneous dependencies. We also show that our method provides an average of 1.5 F1 points of improvement over existing, user-developed information extraction applications on real-world data such as PubMed journal abstracts.

研究の動機と目的

Motivate the bottleneck of labeled data in supervised learning and the need to infer true labels from weak supervision sources.
Introduce a structure learning method that automatically identifies dependencies among labeling functions without using true labels.
Develop a scalable optimization framework using l1-regularized marginal pseudolikelihood to select dependencies.
Provide theoretical guarantees on sample complexity for recovering the true structure.
Empirically validate speed and accuracy improvements on synthetic data and real-world weak supervision tasks.

提案手法

Model weak supervision with a data programming framework where labeling functions generate noisy outputs for unlabeled data.
Generalize to a dependency-augmented factor graph including pairwise and higher-order dependencies among labeling functions and the latent true label.
Replace full structure learning with an efficient l1-regularized marginal pseudolikelihood objective that conditions on all other labeling functions to compute exact gradients (no Gibbs sampling).
Optimize the objective by sequentially learning dependencies for each labeling function using SGD with an online truncated gradient to enforce sparsity (Algorithm 1).
Provide theoretical recovery guarantees: sufficient unlabeled data m scales as O(d/(c^2 κ^2) log(nd/δ)) with d the max number of dependencies per variable; corollary giving O(n log n) when only pairwise/accuracy correlations are present.

実験結果

リサーチクエスチョン

RQ1Can the structure of a generative model for weak supervision be learned automatically from unlabeled data alone?
RQ2What is the data efficiency (sample complexity) to reliably recover true dependency structure under various dependency settings?
RQ3Does learning dependencies improve downstream weak supervision tasks compared to assuming conditional independence?
RQ4How does the proposed pseudolikelihood-based structure learning compare to full marginal likelihood with Gibbs sampling in terms of speed and accuracy?
RQ5Are the theoretical guarantees borne out in synthetic and real-world weak supervision applications?

主な発見

The proposed l1-regularized marginal pseudolikelihood method (for each labeling function conditioned on others) recovers dependencies efficiently without labeled data.
Algorithm 1 is significantly faster (100x) than full marginal likelihood with Gibbs sampling for structure learning.
When focusing on pairwise correlations and accuracy dependencies, the sample complexity is sublinear in the number of potential dependencies (O(n log n)).
Empirical results show an average improvement of about 1.5 F1 points on real-world weak supervision tasks (PubMed abstracts, hardware specs) compared to models with independent labeling functions.
The method speeds up learning and reduces extraneous dependencies (about 1/4 as many extraneous correlations) versus full dependency search in synthetic settings.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。