QUICK REVIEW

[论文解读] Modular Networks: Learning to Decompose Neural Computation

Louis Kirsch, Julius Kunze|arXiv (Cornell University)|Nov 13, 2018

Explainable Artificial Intelligence (XAI)被引用 40

一句话总结

本文介绍了模块化网络，学习将神经计算分解为可重复使用的模块，采用广义 EM 训练框架，使得可确定的模块选择在不需要正则化的情况下实现，并在语言建模和图像分类中显示出提升。

ABSTRACT

Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.

研究动机与目标

通过将计算分解为可复用的模块来推动可扩展的神经网络。
开发一个概率性、端到端可训练的框架，能够同时学习模块及其分解。
实现确定性的模块选择，以减少计算量并提升训练稳定性。
在语言模型和图像分类上展示该方法，并实现可解释的模块专门化。

提出的方法

将网络表示为一组 M 个模块和一个控制器，每层选择 K 个模块。
将模块选择 a 建模为潜在变量，并最大化似然的变分下界。
使用广义 EM，带部分 E 步骤（Viterbi 风格），以保持 q(a) 确定性（q(a)=delta(a,a*)）。
通过 E[log p(y,a|x,θ,φ)] 计算 θ（模块参数）和 φ（控制器）的梯度。
使用两种 E 步策略进行训练：抽取 S 个候选模块组合并选取最佳者，或在无改进时保留先前的 a*。
支持跨层的确定性、共享模块使用，从而实现动态参数共享与重用。

实验结果

研究问题

RQ1神经网络是否能在不显式正则化的情况下，学习将计算分解为可重复使用的模块？
RQ2将模块选择和模块参数端到端学习，是否在语言建模和图像分类上获得有竞争力的表现？
RQ3模块化网络是否在上下文或数据子集上展现出可解释的模块专门化？
RQ4相较于 REINFORCE 和噪声 Top-k 门控，所提训练在稳定性和效率方面如何？

主要发现

类型	模块数 (M)	#并行模块 (K)	测试困惑度
EM Modular Networks	15	1	229.651
EM Modular Networks	5	1	236.809
EM Modular Networks	15	3	246.493
EM Modular Networks	5	3	236.314
REINFORCE	15	1	240.760
REINFORCE	5	1	240.450
REINFORCE	15	3	274.060
REINFORCE	5	3	267.585
Noisy Top-k ( k=4 )	15	1	422.636
Noisy Top-k ( k=4 )	5	1	338.275
Baseline	1	1	247.408
Baseline	3	3	241.294

模块化网络在 Penn Treebank 上实现与基线和基于强化学习的方法相比具有竞争力的困惑度，且训练噪声更低。
语言建模模块在语法/语义上下文中专门化，表明可解释的使用模式。
在 CIFAR-10 上，模块化网络相较于非模块化基线提高了训练准确性，尽管泛化收益随控制器设计而异。
训练方法在训练结束时成功使用所有模块，批量模块选择熵较高，表示使用多样化。
Compared to REINFORCE and noisy top-k, the EM-based method yields lower perplexities and more deterministic module selection.
The approach avoids explicit regularizers for diversity, relying on partial EM updates to prevent module collapse.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。