Skip to main content
QUICK REVIEW

[论文解读] RobustFill: Neural Program Learning under Noisy I/O

Jacob Devlin, Jonathan Uesato|arXiv (Cornell University)|Mar 21, 2017
Advanced Neural Network Applications参考文献 28被引用 108
一句话总结

论文比较神经程序合成与诱导在现实世界字符串变换任务(FlashFill)上的表现,提出一种能够对可变大小的 I/O 集进行编码的注意力 RNN,在 FlashFillTest 上实现 92% 的泛化准确率,并显示出相较于基于规则的和诱导方法更强的鲁棒性,对噪声的容错性较好。

ABSTRACT

The problem of automatically generating a computer program from some specification has been studied since the early days of AI. Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation. Here, for the first time, we directly compare both approaches on a large-scale, real-world learning task. We additionally contrast to rule-based program synthesis, which uses hand-crafted semantics to guide the program generation. Our neural models use a modified attention RNN to allow encoding of variable-sized sets of I/O pairs. Our best synthesis model achieves 92% accuracy on a real-world test set, compared to the 34% accuracy of the previous best neural synthesis approach. The synthesis model also outperforms a comparable induction model on this task, but we more importantly demonstrate that the strength of each approach is highly dependent on the evaluation metric and end-user application. Finally, we show that we can train our neural models to remain very robust to the type of noise expected in real-world data (e.g., typos), while a highly-engineered rule-based system fails entirely.

研究动机与目标

  • Motivate and compare neural program synthesis and neural program induction on a real-world, noisy I/O transformation task.
  • Develop an attention-based neural architecture capable of encoding variable-sized sets of I/O examples.
  • Evaluate end-to-end performance against a hand-crafted rule-based system and an induction-based approach.
  • Assess robustness to realistic noise (typos) in I/O examples.
  • Quantify how evaluation metrics (all-example vs. average-example) influence observed strengths of each approach.

提出的方法

  • Propose a novel variant of the attention-based RNN to encode variable-length, unordered I/O example sets via late pooling.
  • Represent the program in a domain-specific language (DSL) for string transformations including nested expressions and regex-based extractions.
  • Train end-to-end on synthetically generated I/O-Program pairs and decode with beam search, validating consistency against observed I/O pairs.
  • Compare program synthesis (generate P and execute on I/O) with program induction (generate outputs Oy directly) and with a hand-crafted rule-based system.
  • Introduce a dynamic programming-like constraint (DP-Beam) during decoding to prune inconsistent partial programs based on observed outputs.

实验结果

研究问题

  • RQ1Can neural program synthesis outperform neural program induction on real-world FlashFill-like tasks?
  • RQ2How does encoding a variable-sized set of I/O examples with attention affect synthesis accuracy?
  • RQ3What is the impact of noise (typos) in I/O examples on synthesis, induction, and rule-based systems?
  • RQ4How do different evaluation metrics (all-example vs. average-example accuracy) shape perceived strengths of synthesis vs. induction?
  • RQ5Does the DSL’s expressiveness (e.g., GetSpan) contribute to generalization on real-world instances?

主要发现

系统束宽测试中的泛化准确率测试中的全样本准确率测试中的平均样本准确率
Parisotto et al. 2017 (neural synthesis baseline)10034%
Basic Seq-to-Seq10056%
Attention-C10086%
Attention-C-DP100092%
Induction (synthesis architecture variant)353%
  • Attentional architectures significantly outperform basic seq-to-seq baselines (≈25 percentage points gain).
  • Best synthesis model achieves 92% generalization accuracy on FlashFillTest, outperforming the previous best neural approach (34%).
  • The neural synthesis model is far more robust to noise than a hand-crafted rule-based system (with noise, 80% vs. 6% accuracy).
  • Compared to neural induction, synthesis provides higher all-example generalization, while induction can offer partial correctness across assessment examples; both have complementary strengths depending on metric.
  • DP-Beam decoding and late pooling with double attention yield the strongest results (Attention-C-DP with Beam=1000 achieves 92% generalization).
  • Induction (Oy generation) achieves 53% generalization vs. 81% for synthesis under similar settings; induction performs better on average-example accuracy but lags on all-example accuracy.]
  • table_headers: [
  • Beam
  • Generalization Accuracy (test)
  • All-Example Accuracy (test)
  • Average-Example Accuracy (test)

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。