[论文解读] MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
MeGU 引入基于机器引导的“遗忘”,使用语义扰动由多模态大语言模型引导,并结合 Fragment-Align 策略来解耦目标特征并保持保留数据性能。
The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
研究动机与目标
- 动机化并分析现有机器遗忘方法的局限,原因在于目标概念与保留概念之间的纠缠。
- 提出一个框架,使用 MLLMs 诱导语义上有意义的扰动用于遗忘。
- 通过正/负特征噪声的 Fragment-Align 策略解耦目标概念特征。
- 在多种遗忘任务和数据集上通过消融和敏感性分析展示有效性。
提出的方法
- 利用零-shot 的多模态大语言模型(MLLMs)来估计概念间相似性并构建一个轻量级的转移矩阵 T,捕捉语义相似性。
- 通过对扰动候选项进行排序,利用转移矩阵和模型预测生成遗忘实例的扰动标签,确保扰动在语义上有意义但与原标签不同。
- 引入 Fragment-Align,配对两种特征噪声:正向噪声 NPos 与扰动标签对齐,负向噪声 NNeg 以抑制原始目标特征,从而将目标特征与保留概念解耦。
- 通过将目标输入与 NPos 和 NNeg 结合,创建扰动后的遗忘数据 Df^p,推动微调向扰动概念倾斜,同时保留数据性能。
- 通过在扰动遗忘数据和保留数据上进行优化微调,达到重新塑造决策边界但不进行全量再训练。

实验结果
研究问题
- RQ1目标模式特征与语义概念的纠缠如何限制现有遗忘方法?
- RQ2MLLM 是否能提供可靠的、具语义意义的扰动标签来指导遗忘?
- RQ3Fragment-Align 策略是否能够在保留数据上实现选择性遗忘并保持泛化能力?
- RQ4在不同数据集和遗忘场景下,扰动标签引导的遗忘有哪些影响?
主要发现
- MeGU 在三种遗忘任务和多样化数据集上,能够持续优于最先进基线的目标数据删除效果,同时在保留数据上保持强泛化性。
- 转移矩阵引导的扰动标签实现了语义上有意义且实例感知的遗忘方向。
- Fragment-Align 的正负特征噪声将目标特征与原始概念解耦,并强化与扰动概念的一致性。
- 相比现有方法,受控的遗忘过程在防止欠遗忘和过遗忘方面表现更好。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。