QUICK REVIEW

[論文レビュー] Transformer-Patcher: One Mistake worth One Neuron

Zeyu Huang, Yikang Shen|arXiv (Cornell University)|Jan 24, 2023

Software Engineering Research被引用数 10

ひとこと要約

Transformer-Patcherを導入し、最後のFFN層にいくつかのニューロンを追加して訓練することで、トランスフォーマーを逐次編集し、ミスを継続的に修正する高い信頼性と局所性を実現します。

ABSTRACT

Large Transformer-based Pretrained Language Models (PLMs) dominate almost all Natural Language Processing (NLP) tasks. Nevertheless, they still make mistakes from time to time. For a model deployed in an industrial environment, fixing these mistakes quickly and robustly is vital to improve user experiences. Previous works formalize such problems as Model Editing (ME) and mostly focus on fixing one mistake. However, the one-mistake-fixing scenario is not an accurate abstraction of the real-world challenge. In the deployment of AI services, there are ever-emerging mistakes, and the same mistake may recur if not corrected in time. Thus a preferable solution is to rectify the mistakes as soon as they appear nonstop. Therefore, we extend the existing ME into Sequential Model Editing (SME) to help develop more practical editing methods. Our study shows that most current ME methods could yield unsatisfying results in this scenario. We then introduce Transformer-Patcher, a novel model editor that can shift the behavior of transformer-based models by simply adding and training a few neurons in the last Feed-Forward Network layer. Experimental results on both classification and generation tasks show that Transformer-Patcher can successively correct up to thousands of errors (Reliability) and generalize to their equivalent inputs (Generality) while retaining the model's accuracy on irrelevant inputs (Locality). Our method outperforms previous fine-tuning and HyperNetwork-based methods and achieves state-of-the-art performance for Sequential Model Editing (SME). The code is available at https://github.com/ZeroYuHuang/Transformer-Patcher.

研究の動機と目的

展開済みのトランスフォーマーにおけるモデルのミスの継続的オンライン修正の必要性を動機づける（Sequential Model Editing）。
信頼性、一般性、局所性を評価するための指標を備えた標準的なSMEパイプラインを定義する。
元のパラメータを変更せずに最後のFFN層へ訓練可能なニューロン（パッチ）を追加するTransformer-Patcherを提案する。
分類および生成タスクでSMEの有効性を実証し、ベースラインと比較する。
実運用環境におけるパッチベースの編集の効率性とスケーラビリティについて議論する。

提案手法

信頼性、一般性、局所性の3つの願い条件を用いてSequential Model Editing (SME)を定式化する。
元のパラメータを凍結し、最後のFFN層に少数の訓練可能なニューロン（パッチ）を追加するTransformer-Patcherを導入する。
パッチを、パッチキー k_p、パッチ値 v_p、FFN出力を活性化 a_p を介して調整するスカラーバイアス b_p を備えたキー-バリューメモリとして定義する。
ローカリティを課すため、編集損失 l_e と活性化損失 l_a およびメモリ損失 l_m を組み合わせてパッチを訓練する。
メモリ損失は、過去のクエリのメモリMを用いて、関連性の低い入力に対するパッチの活性化を制約し、2つの成分 l_m1 と l_m2 を含む。
標準的なSME評価パイプラインと5つの指標を提供する：成功率、一般化率、編集保持率、訓練保持率、テスト保持率。

実験結果

リサーチクエスチョン

RQ1パッチベースのエディタは、以前の編集を忘れずに、トランスフォーマーモデルに対してどれだけうまく逐次編集を行えるか？
RQ2Transformer-Patcherは、数千の編集にわたり高い信頼性と一般化を実現しつつ局所性を維持できるか？
RQ3SME設定において、Transformer-PatcherはファインチューニングやHyperNetworkベースのエディタとどう比較されるか？
RQ4パッチの位置とメモリサイズが編集の有効性と一般化に与える影響は何か？

主な発見

Transformer-Patcherは、タスク全体でSR ≈ 1およびER ≈ 1を達成し、ほぼ完璧なTrainRとTestRを実現しつつ、数千件の編集を可能にする。
FEVERとzsRE全体で、Transformer-PatcherはSME指標でファインチューニングおよびHyperNetworkベースのエディタを上回る。
局所性と全体性能を維持するにはメモリ損失が重要であり、アブレーションはそれなしでは大きく低下することを示す。
最後のFFN層をパッチする方が、下位層をパッチするより一般化と編集効率が良い。
メモリサイズの変化（5k–40k）に対して方法は堅牢で、性能変化は控えめである。
編集速度は実用的で、FCで7.1秒、QAで18.9秒（V100）となり、数千件の編集へとスケールする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。