QUICK REVIEW

[论文解读] Syntax-Directed Variational Autoencoder for Structured Data

Hanjun Dai, Yingtao Tian|arXiv (Cornell University)|Feb 24, 2018

Topic Modeling参考文献 22被引用 201

一句话总结

该论文提出 SD-VAE，通过使用随机懒属性在结构化数据生成中同时强制语法和语义约束，提升重构、先验有效性，并优化程序与分子。

ABSTRACT

Deep generative models have been enjoying success in modeling continuous data. However it remains challenging to capture the representations for discrete structures with formal grammars and semantics, e.g., computer programs and molecular structures. How to generate both syntactically and semantically correct data still remains largely an open problem. Inspired by the theory of compiler where the syntax and semantics check is done via syntax-directed translation (SDT), we propose a novel syntax-directed variational autoencoder (SD-VAE) by introducing stochastic lazy attributes. This approach converts the offline SDT check into on-the-fly generated guidance for constraining the decoder. Comparing to the state-of-the-art methods, our approach enforces constraints on the output space so that the output will be not only syntactically valid, but also semantically reasonable. We evaluate the proposed model with applications in programming language and molecules, including reconstruction and program/molecule optimization. The results demonstrate the effectiveness in incorporating syntactic and semantic constraints in discrete generative models, which is significantly better than current state-of-the-art approaches.

研究动机与目标

激发并解决生成具有语法正确性且语义意义明确的结构化数据的挑战。
使用面向语法的（属性）语法将语义约束引入解码器。
通过随机懒属性将离线语义检查转换为生成过程中的在线引导。
展示在程序和分子上的改进重构、先验有效性和优化性能。

提出的方法

引入一个随机语法导向解码器，根据上下文和随机属性对产生规则进行采样。
用随机懒属性扩展属性语法，以实现对合成属性的实时合成。
在树结构生成过程中使用继承属性传播语义约束，并通过懒连接来强制语义一致性。
采用基于结构的编码器，将从 CFG 派生的解析表示映射到潜在空间。
使用变分自编码器目标（ELBO）和通过语法导向解码计算的似然 pθ(x|z) 进行训练。

实验结果

研究问题

RQ1如何将语义约束集成到面向结构化数据的神经生成解码器中（超越 CFG）？
RQ2通过随机懒属性的在线语义引导是否能在不牺牲效率的前提下改善句法和语义有效性？
RQ3与 CVAE 和 GVAE 相比，SD-VAE 表示在程序和分子上的重构、先验有效性和优化性能是否更优？

主要发现

SD-VAE 在程序方面接近完美重构，在程序和 SMILES 标记上的先验有效性都非常高。
在评估中，SD-VAE 解码器的先验有效性几乎达到完美，显著优于基线。
SD-VAE 能实现更有效的贝叶斯优化，找到具有更好目标属性的程序和分子，优于 CVAE 和 GVAE。
SD-VAE 学习的潜在表示比基线更具判别性（较低的 RMSE，更高的对数似然）。
该方法在将输出空间收敛至语义有效结构的同时，保持了生成分子的多样性。
可视化结果显示，与之前的方法相比，潜在空间插值更平滑、连贯。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。