QUICK REVIEW

[論文レビュー] Answer Bubbles: Information Exposure in AI-Mediated Search

Michelle Huang, Agam Goyal|arXiv (Cornell University)|Mar 17, 2026

Information Retrieval and Search Behavior被引用数 0

ひとこと要約

この論文は4つの検索システム（ベーシックGPT、Search GPT、Google AI Overviews、Google検索）を11,000のクエリで比較し、情報源の多様性と言語品質、情報源要約の忠実度を評価する。AIを介した検索における体系的な偏りと潜在的な「回答バブル」を明らかにする。

ABSTRACT

Generative search systems are increasingly replacing link-based retrieval with AI-generated summaries, yet little is known about how these systems differ in sources, language, and fidelity to cited material. We examine responses to 11,000 real search queries across four systems -- vanilla GPT, Search GPT, Google AI Overviews, and traditional Google Search -- at three levels: source diversity, linguistic characterization of the generated summary, and source-summary fidelity. We find that generative search systems exhibit significant extit{source-selection} biases in their citations, favoring certain sources over others. Incorporating search also selectively attenuates epistemic markers, reducing hedging by up to 60\% while preserving confidence language in the AI-generated summaries. At the same time, AI summaries further compound the citation biases: Wikipedia and longer sources are disproportionately overrepresented, whereas cited social media content and negatively framed sources are substantially underrepresented. Our findings highlight the potential for extit{answer bubbles}, in which identical queries yield structurally different information realities across systems, with implications for user trust, source visibility, and the transparency of AI-mediated information access.

研究の動機と目的

生成系と従来型の検索システムが引用情報源、ドメイン構成、トピックカバレッジでどのように異なるかを調査する。
システム間でAI生成要約の語彙・認識論的特性を特徴付ける。
引用情報源からの情報をAI要約がどれだけ忠実に表現しているかを評価する。
情報源の選択と統合における偏りを定量化し、情報露出に影響を与える。

提案手法

四つのシステム（ベーシックGPT、Search GPT、Google AI Overview、従来のGoogle検索）を用いて11のトピックにわたる11,000件の実ユーザー検索をクエリする。
ドメインレベルの分析（トップ100ドメイン、Jaccardによる重複）を用いて情報源、トピック分布、引用パターンを注釈付けする。
LIWCカテゴリ、冗長性、可読性、転換器ベースの丁寧さ／形式性／極性／主観性スコアを含む語彙・認識論的特徴を算出する。
回答を原子内容単位（ACU）に分解し、RoBERTa-large MNLI微調整モデルで含意確率を評価して情報源-ACU忠実度を測定する。
生成要約における情報源の均等表現を定量化するためにEqual Coverage（EC）とCoverage Parity（CP）指標を適用する。
ブートストラップ再標本化とノンパラメトリック検定（Mann-Whitney、Benjamini-Hochberg）を用いて有意性を評価する。

Figure 1: Paper Overview. Traditional search returns a ranked list of links for users to evaluate, while generative search produces an answer bubble containing AI-generated summaries synthesized from multiple sources. We study these answer bubbles along three dimensions: the sources they cite (RQ1),

実験結果

リサーチクエスチョン

RQ1RQ1: 生成系検索システムによって引用される情報源は、情報源の多様性、集中性、ドメイン構成の点で従来の検索とどのように異なるか。
RQ2RQ2: 生成系回答の認識論的、心理言語学的、文体的特性は、システムやトピックによってどのように異なるか。
RQ3RQ3: 生成系検索要約は、引用情報源の情報をどれだけ忠実に表現しているか。

主な発見

生成システムは異なる情報源プールを使用する。例えば、Search GPTはトップ100ドメインの重複で従来のGoogle検索とわずか24–25%の重複にとどまり、一方でWikipediaが全システムで支配的に引用されている。
検索根拠付けはヘッジ言語を最大60%低減しつつ、信頼性マーカーを維持し、トピック依存的な変動がある。
Wikipediaが最も引用され、過剰に代表されるドメインである。一方、ソーシャルメディアの内容はAI生成要約では最大22ポイント程度過少表現されている。
長い情報源が過剰に表現され、短い情報源が過小表現される、情報源使用に長さバイアスが存在する。
引用可能な情報源で因果関係を示す言語と確実性マーカーが要約で過度にカバーされ、主観的な情報源は過少カバーされ、断定的で説明的な文体へ偏っている。

Figure 2: Top-15 cited domains by query topic for each source (% of queries citing each domain). Cell values $\geq$ 1% are shown. Domain preferences are strongly topic-dependent: IMDB dominates entertainment, ESPN dominates sports, and Spotify/Genius dominate music, but only in Google’s systems and

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。