QUICK REVIEW

[論文レビュー] The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Lukas Berglund, Meg Tong|arXiv (Cornell University)|Sep 21, 2023

Law, AI, and Intellectual Property被引用数 31

ひとこと要約

自己回帰型 LLM は A は B で学習すると B は A へ一般化できず、反転一般化はほぼゼロで、データ拡張によっても改善されず、モデルサイズやファミリを超えて変わらない。

ABSTRACT

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

研究の動機と目的

自動回帰型 LLM が A は B から B は A（反転呪い）へ一般化しないことを実証する。
この反転の失敗がモデルサイズとファミリを超えて持続し、データ拡張によって解決されないことを示す。
合成微調整データと実世界のセレブリティ事実の評価を通じて実用的影響を測る。

提案手法

'<name> is <description>' 形式の合成事実で GPT-3 および Llama-1 モデルを微調整し、'<description> is <name>' への反転一般化をテストする。
二つのプロンプト形式を評価する：正確一致の精度と、説明が提供されたときに正しい名前の尤度が高まるかどうか。
データをパラフレーズと「Both orders」サブセットで拡張してメタ学習を促進し、その結果を比較する。

実験結果

リサーチクエスチョン

RQ1自動回帰型 LLM は合成データで微調整した後、'A is B' から逆の 'B is A' へ一般化するか？
RQ2実世界の知識、たとえば有名人の親子関係などで反転呪いを観察できるか？
RQ3データ拡張や混合順序訓練は反転の失敗を軽減できるか？
RQ4反転効果はモデルサイズやファミリ（GPT-3、Llama など）によって変わるか？

主な発見

サブセット	同じ方向	反対方向
名前から説明へ	50.0 ± 2.1	0.0 ± 0.0
説明から名前へ	96.7 ± 1.2	0.1 ± 0.1

テストプロンプトが微調整順と一致する場合にのみ一般化するが、逆方向では完全に失敗する（精度はほぼ0）。
DescriptionToName の事実では、前方方向の正確一致は 96.7%、GPT-3-175B の逆方向は 0.1% 。
NameToDescription の事実では、前方方向の正確一致は 50.0%、GPT-3-175B の逆方向は 0.0% 。
尤度増加テストでは、順序が逆になったときにも正しい名前とランダムな名前の対数確率に有意差は検出されない。
実世界のセレブリティを用いた実験2では顕著な非対称性を示す：GPT-4 は親関連の質問にはよく答えるが、親から有名人を特定することは苦手で、反転呪いの実践的現れを示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。