[论文解读] Memorize or generalize? Searching for a compositional RNN in a haystack
本文研究标准 RNN 是否能够在表格查找域中学习成分性泛化。在对随机初始化的 RNN 进行梯度下降训练后,只有一小部分但不可忽视的比例会收敛到成分性解,尽管大多数并非如此。
Neural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared by different tasks, and recombining them to solve new problems. In this paper, we explore the compositional generalization capabilities of recurrent neural networks (RNNs). We first propose the lookup table composition domain as a simple setup to test compositional behaviour and show that it is theoretically possible for a standard RNN to learn to behave compositionally in this domain when trained with standard gradient descent and provided with additional supervision. We then remove this additional supervision and perform a search over a large number of model initializations to investigate the proportion of RNNs that can still converge to a compositional solution. We discover that a small but non-negligible proportion of RNNs do reach partial compositional solutions even without special architectural constraints. This suggests that a combination of gradient descent and evolutionary strategies directly favouring the minority models that developed more compositional approaches might suffice to lead standard RNNs towards compositional solutions.
研究动机与目标
- Motivate compositional learning as a route to generalization and lifelong learning.
- Propose a simple lookup-table composition domain to test compositional behavior.
- Demonstrate that with supervision an RNN can encode a compositional finite-state solution.
- Investigate whether standard training can yield compositional solutions without architectural constraints.
- Assess factors that influence discovery of compositional solutions (initialization, task order, training regime).
提出的方法
- Introduce a lookup-table composition domain using atomic and composed bit-string table lookups.
- Model as a character-level sequence-to-sequence RNN with an architecture designed to reflect compositional state transitions.
- Phase 1: supervise recurrent layer transitions to encode a finite-state automaton solving compositions, then train mapping from states to outputs.
- Phase 2: train with standard cross-entropy on atomic and composed tasks to test discovery of compositional solutions without explicit supervision on internal states.
- Conduct a large random search over 50k random initializations to assess zero-shot compositional generalization after training on atomic and composed tasks.
- Evaluate generalization on withheld composed inputs and compare with baselines.
实验结果
研究问题
- RQ1Can a standard RNN learn to behave compositionally in a finite-domain composition task under supervision?
- RQ2To what extent can standard gradient-descent training discover compositional solutions without explicit architectural constraints?
- RQ3What factors (initialization, task ordering, curriculum) influence the emergence of compositional solutions?
- RQ4Do models that generalize compositionally rely on decomposing prompts or on memorized mappings?
- RQ5How does zero-shot generalization performance compare to baseline random strategies?
主要发现
| 模型 | 泛化性能(%) |
|---|---|
| RNN (random search) | 19.60 |
| random-output | 0.00 |
| random-wellformed-output | 0.01 |
| random-task-codes | 4.56 |
- Experiment 1 shows a network can implement a compositional finite-state automaton and achieve 96% correct on atomic and composed tasks when supervised on the automaton’s state transitions.
- A large random search (50k models) finds a small but non-negligible fraction of models that generalize compositionally in zero-shot tests after training on atomic and composed tasks.
- Approximately 2% of models reach zero-shot accuracy above 80%, and about 0.75% reach above 90% in compositional generalization.
- Initializations have little effect on success odds; instead, the random order of task presentations and weight updates largely determine whether a model becomes memorization-based or compositional.
- Many converged compositional models do not decode prompts into decomposed atomic tasks; some rely on arbitrary task codes and memorization rather than parsing the prompt structure.
- Training solely on composed tasks can boost the share of models achieving strong zero-shot generalization (e.g., 5.5% with zero-shot >90% in a subset).
- Baselines show zero-shot generalization far below learned models; random-task-codes baselines perform better than purely random outputs but far from learned models.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。