[論文レビュー] How Language Model Hallucinations Can Snowball
本論文は、言語モデルの幻覚が雪だるま式に拡大することを示している。最初の誤答はしばしば誤った根拠につながり、個別に尋ねられた場合にそれらの雪だるま的誤りをモデルが認識できることもある。3つのQAデータセットを提供し、ChatGPTとGPT-4の検出・緩和戦略を分析している。
A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we hypothesize that in some cases, when justifying previously generated hallucinations, LMs output false claims that they can separately recognize as incorrect. We construct three question-answering datasets where ChatGPT and GPT-4 often state an incorrect answer and offer an explanation with at least one incorrect claim. Crucially, we find that ChatGPT and GPT-4 can identify 67% and 87% of their own mistakes, respectively. We refer to this phenomenon as hallucination snowballing: an LM over-commits to early mistakes, leading to more mistakes that it otherwise would not make.
研究の動機と目的
- Motivate and characterize hallucination snowballing as a failure mode in LMs.
- Empirically demonstrate the prevalence of snowballing using three QA datasets.
- Quantify how often state-of-the-art models recognize their own snowballed errors in isolation.
提案手法
- Automatically construct three yes/no QA datasets (primality, senator alma mater, graph connectivity) where incorrect answers lead to verifiable incorrect claims in explanations.
- Evaluate ChatGPT (gpt-3.5-turbo) and GPT-4 with greedy decoding on zero-shot prompts.
- Extract and verify the model’s incorrect claims and test whether the model can recognize these claims in a separate session.
- Assess the impact of prompting (e.g., “Let’s think step-by-step”) and decoding strategies (temperature, top-k, nucleus, beam search) on snowballing.
- Provide datasets and code publicly to enable replication and further study.
実験結果
リサーチクエスチョン
- RQ1Do LMs frequently produce incorrect answers that are accompanied by incorrect, yet testable, justifications?
- RQ2Can LMs identify and verify their own snowballed hallucinations when prompted to check the incorrect claims in isolation?
- RQ3How effective are prompting and decoding strategies in reducing snowball hallucinations?
- RQ4What are the limitations of current models in avoiding snowballing during reasoning tasks?
主な発見
| Task | Average | Model | Graph Connectivity | Primality Testing | Senator Search |
|---|---|---|---|---|---|
| ChatGPT | 410/500 (82.0%) | ChatGPT | 339/500 (67.8%) | 153/500 (30.6%) | 60.13% |
| GPT-4 | 442/500 (88.4%) | GPT-4 | 374/500 (74.8%) | 435/500 (87.0%) | 83.40% |
- ChatGPT and GPT-4 exhibit low overall QA accuracy across the three datasets (average accuracy: ChatGPT ~39.87%, GPT-4 ~16.6%).
- Both models commit to an answer within the first token (Yes/No) in over 95% of cases, and these initial commits are frequently incorrect.
- ChatGPT recognizes 67.37% of its snowballed incorrect claims; GPT-4 recognizes 87.03% of such claims when evaluated in isolated verification.
- Prompting with step-by-step reasoning improves task accuracy on some datasets (e.g., Senator Search) but can introduce reasoning errors and still leave snowballing at high levels.
- Higher-temperature decoding and sampling methods do not eliminate snowballing; beam search—unavailable in OpenAI API—could potentially mitigate it, while backtracking prompts may help in some cases.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。