Skip to main content
QUICK REVIEW

[論文レビュー] Memorize or generalize? Searching for a compositional RNN in a haystack

Adam Liska, Germán Kruszewski|arXiv (Cornell University)|Feb 18, 2018
Reinforcement Learning in Robotics参考文献 26被引用数 60
ひとこと要約

この論文は、標準的なRNNがテーブルルックアップ領域における組成的一般化を学べるかを検討する。ランダムに初期化されたRNNのごく小さだが無視できない割合が、勾配降下トレーニングの下で組成的解へ収束するが、多くはそうならない。

ABSTRACT

Neural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared by different tasks, and recombining them to solve new problems. In this paper, we explore the compositional generalization capabilities of recurrent neural networks (RNNs). We first propose the lookup table composition domain as a simple setup to test compositional behaviour and show that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision. We then remove this additional supervision and perform a search over a large number of model initializations to investigate the proportion of RNNs that can still converge to a compositional solution. We discover that a small but non-negligible proportion of RNNs do reach partial compositional solutions even without special architectural constraints. This suggests that a combination of gradient descent and evolutionary strategies directly favouring the minority models that developed more compositional approaches might suffice to lead standard RNNs towards compositional solutions.

研究の動機と目的

  • Motivate compositional learning as a route to generalization and lifelong learning.
  • Propose a simple lookup-table composition domain to test compositional behavior.
  • Demonstrate that with supervision an RNN can encode a compositional finite-state solution.
  • Investigate whether standard training can yield compositional solutions without architectural constraints.
  • Assess factors that influence discovery of compositional solutions (initialization, task order, training regime).

提案手法

  • Introduce a lookup-table composition domain using atomic and composed bit-string table lookups.
  • Model as a character-level sequence-to-sequence RNN with an architecture designed to reflect compositional state transitions.
  • Phase 1: supervise recurrent layer transitions to encode a finite-state automaton solving compositions, then train mapping from states to outputs.
  • Phase 2: train with standard cross-entropy on atomic and composed tasks to test discovery of compositional solutions without explicit supervision on internal states.
  • Conduct a large random search over 50k random initializations to assess zero-shot compositional generalization after training on atomic and composed tasks.
  • Evaluate generalization on withheld composed inputs and compare with baselines.

実験結果

リサーチクエスチョン

  • RQ1Can a standard RNN learn to behave compositionally in a finite-domain composition task under supervision?
  • RQ2To what extent can standard gradient-descent training discover compositional solutions without explicit architectural constraints?
  • RQ3What factors (initialization, task ordering, curriculum) influence the emergence of compositional solutions?
  • RQ4Do models that generalize compositionally rely on decomposing prompts or on memorized mappings?
  • RQ5How does zero-shot generalization performance compare to baseline random strategies?

主な発見

モデル一般化性能(%)
RNN (random search)19.60
random-output0.00
random-wellformed-output0.01
random-task-codes4.56
  • Experiment 1 shows a network can implement a compositional finite-state automaton and achieve 96% correct on atomic and composed tasks when supervised on the automaton’s state transitions.
  • A large random search (50k models) finds a small but non-negligible fraction of models that generalize compositionally in zero-shot tests after training on atomic and composed tasks.
  • Approximately 2% of models reach zero-shot accuracy above 80%, and about 0.75% reach above 90% in compositional generalization.
  • Initializations have little effect on success odds; instead, the random order of task presentations and weight updates largely determine whether a model becomes memorization-based or compositional.
  • Many converged compositional models do not decode prompts into decomposed atomic tasks; some rely on arbitrary task codes and memorization rather than parsing the prompt structure.
  • Training solely on composed tasks can boost the share of models achieving strong zero-shot generalization (e.g., 5.5% with zero-shot >90% in a subset).
  • Baselines show zero-shot generalization far below learned models; random-task-codes baselines perform better than purely random outputs but far from learned models.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。