QUICK REVIEW

[论文解读] Sluice networks: Learning what to share between loosely related tasks.

Sebastian Ruder, Joachim Bingel|arXiv (Cornell University)|May 23, 2017

Domain Adaptation and Few-Shot Learning参考文献 25被引用 117

一句话总结

Sluice Networks 引入了一种可学习的参数化方法，动态控制在松散相关任务之间共享神经网络的哪些部分，从而在层、子空间和跳跃连接之间灵活实现硬共享或软共享。该框架在使用 OntoNotes 5.0 的七个 NLP 领域中实现了最高达 15% 的平均误差降低，优于标准的多任务学习方法。

ABSTRACT

Multi-task learning is partly motivated by the observation that humans bring to bear what they know about related problems when solving new ones. Similarly, deep neural networks can profit from related tasks by sharing parameters with other networks. However, humans do not consciously decide to transfer knowledge between tasks (and are typically not aware of the transfer). In machine learning, it is hard to estimate if sharing will lead to improvements; especially if tasks are only loosely related. To overcome this, we introduce Sluice Networks, a general framework for multi-task learning where trainable parameters control the amount of sharing -- including which parts of the models to share. Our framework goes beyond and generalizes over previous proposals in enabling hard or soft sharing of all combinations of subspaces, layers, and skip connections. We perform experiments on three task pairs from natural language processing, and across seven different domains, using data from OntoNotes 5.0, and achieve up to 15% average error reductions over common approaches to multi-task learning. We analyze when the architecture is particularly helpful, as well as its ability to fit noise. We show that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing, and b) while sluice networks easily fit noise, they are robust across domains in practice.

研究动机与目标

为解决在多任务学习中确定何时以及如何共享参数的挑战，特别是当任务仅具有松散关联时。
开发一种框架，实现对模型组件（层、子空间、跳跃连接）共享的自动、可学习控制。
超越现有方法，实现架构组件之间所有组合的硬共享与软共享。
在多样化的 NLP 任务和领域中评估该框架的有效性，特别是在低资源或噪声环境下的表现。
研究标签熵与性能提升之间的关系，并评估对噪声的鲁棒性。

提出的方法

引入一种参数化的门控机制，学习在不同任务之间共享网络的哪些部分，从而支持硬共享和软共享。
设计一种模块化架构，其中每一层或子空间都配备一个可学习的门控机制，控制对共享参数的访问。
允许在任意组合的层、子空间和跳跃连接之间实现共享，从而对参数共享实现细粒度控制。
使用标准反向传播端到端训练整个模型，门控参数通过优化联合任务性能进行更新。
采用硬共享的可微分松弛形式，以支持软共享，从而实现基于梯度的共享决策优化。
在 OntoNotes 5.0 上将该框架应用于多个 NLP 任务，通过在多样化领域中进行训练，评估其泛化能力。

实验结果

研究问题

RQ1可学习的自适应共享机制是否能提升在 NLP 中松散相关多任务学习任务上的性能？
RQ2在所提出的框架中，标签熵与性能增益之间存在何种相关性？
RQ3Sluice Networks 在不同领域之间的泛化能力如何，能否有效处理噪声数据？
RQ4在层、子空间和跳跃连接级别控制共享的能力是否能带来优于固定共享策略的性能提升？
RQ5所学习的共享模式对标签噪声和领域偏移的鲁棒性如何？

主要发现

在使用 OntoNotes 5.0 的七个不同 NLP 领域中，Sluice Networks 相较于标准多任务学习基线，实现了最高达 15% 的平均误差降低。
标签熵可预测性能增益，证实了不确定性更高的任务从共享表示中获益更多。
尽管模型具备拟合噪声的能力，但其在不同领域间表现出强大的鲁棒性，表明在真实场景中具有实际应用价值。
该框架能够有效学习应共享哪些架构组件，优于固定共享策略。
在层、子空间和跳跃连接之间学习硬共享或软共享的能力，带来了更好的泛化能力和性能提升。
当任务关系微弱或模糊时，该方法仍能保持高性能，表明其有效学习归纳偏置的能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。