QUICK REVIEW

[论文解读] What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models

Guimin Hu, Meng Li|arXiv (Cornell University)|Jan 15, 2026

Explainable Artificial Intelligence (XAI)被引用 0

一句话总结

论文引入基于熵和因果效应的度量来识别MoE语言模型中的领域专家和驱动专家，分析令牌触发模式，并显示通过调整领域/驱动专家权重可以在三个MoE LLM和三个领域中提升性能。

ABSTRACT

Most interpretability work focuses on layer- or neuron-level mechanisms in Transformers, leaving expert-level behavior in MoE LLMs underexplored. Motivated by functional specialization in the human brain, we analyze expert activation by distinguishing domain and driver experts. In this work, we study expert activation in MoE models across three public domains and address two key questions: (1) which experts are activated, and whether certain expert types exhibit consistent activation patterns; and (2) how tokens are associated with and trigger the activation of specific experts. To answer these questions, we introduce entropy-based and causal-effect metrics to assess whether an expert is strongly favored for a particular domain, and how strongly expert activation contributes causally to the model's output, thus identify domain and driver experts, respectively. Furthermore, we explore how individual tokens are associated with the activation of specific experts. Our analysis reveals that (1) Among the activated experts, some show clear domain preferences, while others exert strong causal influence on model performance, underscoring their decisive roles. (2) tokens occurring earlier in a sentence are more likely to trigger the driver experts, and (3) adjusting the weights of domain and driver experts leads to significant performance gains across all three models and domains. These findings shed light on the internal mechanisms of MoE models and enhance their interpretability.

研究动机与目标

通过聚焦专家级激活来推动对基于MoE的LLM的可解释性，而不仅仅是层级/神经元级分析。
将领域专家定义为领域专门化的路由器，将驱动专家定义为具因果影响力的路由器。
开发基于熵和因果效应的度量，在多种MoE LLMs和领域中识别领域和驱动专家。
研究令牌如何触发特定专家，以及调整专家权重如何影响性能。

提出的方法

受到神经科学启发，定义领域专家与驱动专家为专门化且具有因果影响力的MoE专家。
创建领域特定的激活熵H_i(D_j)与激活率A_i(D_j)，以计算确定性加权的激活分数S_i(D_j)。
通过扰动门控对数(logits)并通过KL散度衡量输出变化来估计驱动专家的因果效应，即P(X)与Q(X)的差异。
使用二值化简化的Top-k路由来计算领域激活，并采用Pearl启发的因果图进行中介分析。
在三个MoE LLMs（Mixtral、DeepSeek-MoE、Qwen-MoE）及三个领域（SA、MMLU、Math）上评估，并分析令牌–专家映射。
在上/下调领域或驱动专家权重以及使用LoRA对路由器进行微调以评估性能提升的影响。

实验结果

研究问题

RQ1在MoE LLMs和领域中激活了哪些专家，是否存在某些类型具有一致的激活模式？
RQ2令牌如何与激活特定领域/驱动专家相关并能够触发它们？
RQ3领域与驱动专家的激活是否对模型输出具有因果影响，调整其路由权重能否提升性能？
RQ4早期令牌是否对驱动/领域专家的激活更具影响力，哪些令牌特征了领域与驱动激活？

主要发现

在各领域中通用型专家主导激活，而领域专家和驱动专家虽然处于少数，但在SA和Math领域更具影响力。
驱动专家往往位于中间层，且在网络中段提升其因果影响力；在三种模型中，DeepSeek表现出最强的因果敏感性。
在SA和Math领域以及三种模型中，上调领域或驱动专家权重均带来一致的性能提升；下调权重则降低性能，尤其在驱动专家被削弱时。
句子前部的令牌更可能激活驱动专家，表明令牌位置对专家路由具有影响。
不同领域的代表性领域和驱动令牌各不相同（如SA使用情感/情境线索；Math使用运算术语），揭示领域–专家关联的差异，可指导任务感知的路由策略。
在跨模型与领域中，调整专家权重可带来可衡量的准确率/F1提升（如摘要中提到领域专家平均准确率提升2.08%，驱动专家平均提升3.00%）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。