QUICK REVIEW

[论文解读] Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Jinguo Zhu, Xizhou Zhu|arXiv (Cornell University)|Jun 9, 2022

Multimodal Machine Learning Applications被引用 28

一句话总结

引入条件专家集合（Conditional MoEs）以缓解通用模型在多任务中的任务干扰，将其整合到 Uni-Perceiver，并在仅使用1% 下游数据进行提示微调时实现最先进的结果，同时保持零-shot 泛化。

ABSTRACT

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.

研究动机与目标

解释通用多任务模型中的任务干扰问题及其对性能的影响。
提出具有不同路由策略的条件MoEs，以在减小干扰的同时保持泛化。
证明配备条件MoEs的 Uni-Perceiver 能在有限的下游数据下达到强劲性能，并支持对新任务的零-shot 泛化。

提出的方法

通过梯度方向度量分析任务干扰以量化跨任务效应。
定义在标记级、上下文级、模态级、任务级和属性条件下的路由策略的条件MoEs。
用条件MoE层替换 Uni-Perceiver 的自注意力和 FFN 块中的线性投影。
引入 8 维的标记属性嵌入，以实现数据与任务通用的路由决策。
比较数据相关和数据无关路由变体在训练/推理成本与泛化方面的差异。
进行大规模预训练和下游评估，包括对 1% 数据的提示微调。

实验结果

研究问题

RQ1在跨任务和跨模态共享参数时，跨任务干扰如何影响通用模型的性能？
RQ2条件MoEs 能否在降低干扰的同时保持或提升对未见任务的泛化？
RQ3哪些路由策略（标记、上下文、模态、任务、属性）在效率和精度之间为 Uni-Perceiver 提供最佳权衡？
RQ4对于带有条件MoEs的通用模型，提示微调与数据效率与完全监督微调相比如何？
RQ5带有条件MoEs的模型是否仍具备对新任务（如视频文本检索和视频字幕生成）的零-shot 能力？

主要发现

模型	训练时间	推理时间	ImageNet-1k(训练准确度)	COCO Caption(B@4 验证)	MLM(训练准确度)	MLM(验证困惑度)
Uni-Perceiver-Ti	1.0×	1.0×	47.3	68.3	49.2	5.86
Uni-Perceiver-Ti + Conditional MoEs (token)	1.8×	2.2×	53.1	72.7	52.9	4.96
Uni-Perceiver-Ti + Conditional MoEs (context)	2.2×	2.6×	52.5	73.1	52.8	4.86
Uni-Perceiver-Ti + Conditional MoEs (modality)	1.4×	1.0×	51.7	72.6	52.1	5.06
Uni-Perceiver-Ti + Conditional MoEs (task)	1.4×	1.0×	52.9	73.2	52.7	4.56
Uni-Perceiver-Ti + Conditional MoEs (attribute)	1.4×	1.0×	52.8	73.3	53.1	4.56

条件 MoEs 能缓解任务干扰，提升相对于完全共享的 Uni-Perceiver 基线的性能。
在路由变体中，属性 MoEs（带 8 位标记属性嵌入）在效率和泛化方面提供了更强的性能。
数据无关 MoE 变体（模态、任务、属性）实现高效率，且可通过重参数化合并为单一投影；数据相关变体会带来更高的训练/推理成本。
在 1% 下游数据用于提示微调时，Uni-Perceiver-MoEs 的结果与使用更多数据和计算的最新方法相竞争。
Uni-Perceiver-MoEs 保持对新任务（如视频字幕和视频文本检索）的零-shot 泛化，并提升 GLUE 基准测试的表现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。