QUICK REVIEW

[論文レビュー] Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data

Eugenia Iofinova, Dan Alistarh|arXiv (Cornell University)|Jan 30, 2026

Topic Modeling被引用数 0

ひとこと要約

The paper presents Behemoth, a fully synthetic data framework to study and benchmark knowledge editing and unlearning in LLMs, using a small GPT-style model and controlled facts stored as {subject, relation, object} tuples.

ABSTRACT

As artificial neural networks, and specifically large language models, have improved rapidly in capabilities and quality, they have increasingly been deployed in real-world applications, from customer service to Google search, despite the fact that they frequently make factually incorrect or undesirable statements. This trend has inspired practical and academic interest in model editing, that is, in adjusting the weights of the model to modify its likely outputs for queries relating to a specific fact or set of facts. This may be done either to amend a fact or set of facts, for instance, to fix a frequent error in the training data, or to suppress a fact or set of facts entirely, for instance, in case of dangerous knowledge. Multiple methods have been proposed to do such edits. However, at the same time, it has been shown that such model editing can be brittle and incomplete. Moreover the effectiveness of any model editing method necessarily depends on the data on which the model is trained, and, therefore, a good understanding of the interaction of the training data distribution and the way it is stored in the network is necessary and helpful to reliably perform model editing. However, working with large language models trained on real-world data does not allow us to understand this relationship or fully measure the effects of model editing. We therefore propose Behemoth, a fully synthetic data generation framework. To demonstrate the practical insights from the framework, we explore model editing in the context of simple tabular data, demonstrating surprising findings that, in some cases, echo real-world results, for instance, that in some cases restricting the update rank results in a more effective update. The code is available at https://github.com/IST-DASLab/behemoth.git.

研究の動機と目的

Behemoth を紹介する。LLM の知識編集を研究するための完全合成データ生成器。
異なる編集手法（全パラメータ微調整、低秩微調整、ROME）が標的編集と付随的ダメージに与える影響を調査する。
データ分布（独立、相関、ネストされた関係）に応じた編集効果の変化を分析する。
編集を達成しつつ全体的な性能を preserving するために微調整するモデルの層を探る。

提案手法

カスタム文法と語彙を用いて {subject, relationship, object} の完全に合成された事実を生成する。
これらのタプルから構築した合成文でGPT風のPythia-31mモデルを訓練する。
特定の事実を置換または忘れる編集を用いてモデルを微調整し、非編集コンテンツを保持するデータ混合を行う。
編集手法を比較する（全秩更新、LoRA 低秩微調整、ROM E）さまざまなシナリオで。
単純、相関、ネスト関係の設定で編集をテストし、直接効果と下流効果を評価する。

実験結果

リサーチクエスチョン

RQ1合成LLMの特定事実をどの程度効果的に編集し、残る知識を保持できるか？
RQ2データ分布パターン（独立、相関、ネスト）によって編集効果とモデルのダメージはどう変わるか？
RQ3どの層やサブコンポーネント（MLP 対 Attention）を微調整するのが、付随的影響を最小限にしつつ信頼できる編集には適しているか？
RQ41つまたは複数の事実を忘れることと、忘却の全体/一部の関係を更新することは、モデルの忘却度と精度にどのように影響するか？
RQ5LoRA は全微調整と比較して、編集時にモデルの一般的な性能をより多く保持するか？

主な発見

単一または同一の編集に対しては、rank-32 の更新で編集を達成でき、ROME はいくつかのケースでやや劣る。
10個の異なる編集には rank-64 以上が必要であり、より高いランクは残りの精度を低下させる可能性がある。
全体の関係を忘却する場合、単純・ネスト構成では大きな精度低下を避けるためにしばしば全秩更新が必要となる。
レイヤー選択が重要で、編集の効果と残存精度はどの Transformer ブロックを調整するか、また MLP 対 Attention のいずれを調整するかに依存する。
LoRA は低いランクでの編集を可能にし、時には Attention のみの微調整が MLP 中心の更新より高い精度を保持することがある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。