QUICK REVIEW

[论文解读] RobustFill: Neural Program Learning under Noisy I/O

Jacob Devlin, Jonathan Uesato|arXiv (Cornell University)|Mar 21, 2017

Advanced Neural Network Applications参考文献 28被引用 108

一句话总结

论文比较神经程序合成与诱导在现实世界字符串变换任务（FlashFill）上的表现，提出一种能够对可变大小的 I/O 集进行编码的注意力 RNN，在 FlashFillTest 上实现 92% 的泛化准确率，并显示出相较于基于规则的和诱导方法更强的鲁棒性，对噪声的容错性较好。

ABSTRACT

The problem of automatically generating a computer program from some specification has been studied since the early days of AI. Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation. Here, for the first time, we directly compare both approaches on a large-scale, real-world learning task. We additionally contrast to rule-based program synthesis, which uses hand-crafted semantics to guide the program generation. Our neural models use a modified attention RNN to allow encoding of variable-sized sets of I/O pairs. Our best synthesis model achieves 92% accuracy on a real-world test set, compared to the 34% accuracy of the previous best neural synthesis approach. The synthesis model also outperforms a comparable induction model on this task, but we more importantly demonstrate that the strength of each approach is highly dependent on the evaluation metric and end-user application. Finally, we show that we can train our neural models to remain very robust to the type of noise expected in real-world data (e.g., typos), while a highly-engineered rule-based system fails entirely.

研究动机与目标

Motivate and compare neural program synthesis and neural program induction on a real-world, noisy I/O transformation task.
Develop an attention-based neural architecture capable of encoding variable-sized sets of I/O examples.
Evaluate end-to-end performance against a hand-crafted rule-based system and an induction-based approach.
Assess robustness to realistic noise (typos) in I/O examples.
Quantify how evaluation metrics (all-example vs. average-example) influence observed strengths of each approach.

提出的方法

Propose a novel variant of the attention-based RNN to encode variable-length, unordered I/O example sets via late pooling.
Represent the program in a domain-specific language (DSL) for string transformations including nested expressions and regex-based extractions.
Train end-to-end on synthetically generated I/O-Program pairs and decode with beam search, validating consistency against observed I/O pairs.
Compare program synthesis (generate P and execute on I/O) with program induction (generate outputs Oy directly) and with a hand-crafted rule-based system.
Introduce a dynamic programming-like constraint (DP-Beam) during decoding to prune inconsistent partial programs based on observed outputs.

实验结果

研究问题

RQ1Can neural program synthesis outperform neural program induction on real-world FlashFill-like tasks?
RQ2How does encoding a variable-sized set of I/O examples with attention affect synthesis accuracy?
RQ3What is the impact of noise (typos) in I/O examples on synthesis, induction, and rule-based systems?
RQ4How do different evaluation metrics (all-example vs. average-example accuracy) shape perceived strengths of synthesis vs. induction?
RQ5Does the DSL’s expressiveness (e.g., GetSpan) contribute to generalization on real-world instances?

主要发现

系统	束宽	测试中的泛化准确率	测试中的全样本准确率	测试中的平均样本准确率
Parisotto et al. 2017 (neural synthesis baseline)	100	34%	—	—
Basic Seq-to-Seq	100	56%	—	—
Attention-C	100	86%	—	—
Attention-C-DP	1000	92%	—	—
Induction (synthesis architecture variant)	3	—	53%	—

Attentional architectures significantly outperform basic seq-to-seq baselines (≈25 percentage points gain).
Best synthesis model achieves 92% generalization accuracy on FlashFillTest, outperforming the previous best neural approach (34%).
The neural synthesis model is far more robust to noise than a hand-crafted rule-based system (with noise, 80% vs. 6% accuracy).
Compared to neural induction, synthesis provides higher all-example generalization, while induction can offer partial correctness across assessment examples; both have complementary strengths depending on metric.
DP-Beam decoding and late pooling with double attention yield the strongest results (Attention-C-DP with Beam=1000 achieves 92% generalization).
Induction (Oy generation) achieves 53% generalization vs. 81% for synthesis under similar settings; induction performs better on average-example accuracy but lags on all-example accuracy.]
table_headers: [
Beam
Generalization Accuracy (test)
All-Example Accuracy (test)
Average-Example Accuracy (test)

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。