QUICK REVIEW

[논문 리뷰] Answer Bubbles: Information Exposure in AI-Mediated Search

Michelle Huang, Agam Goyal|arXiv (Cornell University)|2026. 03. 17.

Information Retrieval and Search Behavior인용 수 0

한 줄 요약

이 논문은 네 가지 검색 시스템(vanilla GPT, Search GPT, Google AI Overview, Google Search)을 11,000개의 질의에 걸쳐 비교하여 소스 다양성, 언어적 특성, 및 소스-요약 적합성을 평가하고, AI 중개 검색에서 체계적인 편향과 잠재적인 '답변 버블'(answer bubbles)을 드러낸다.

ABSTRACT

Generative search systems are increasingly replacing link-based retrieval with AI-generated summaries, yet little is known about how these systems differ in sources, language, and fidelity to cited material. We examine responses to 11,000 real search queries across four systems -- vanilla GPT, Search GPT, Google AI Overviews, and traditional Google Search -- at three levels: source diversity, linguistic characterization of the generated summary, and source-summary fidelity. We find that generative search systems exhibit significant extit{source-selection} biases in their citations, favoring certain sources over others. Incorporating search also selectively attenuates epistemic markers, reducing hedging by up to 60\% while preserving confidence language in the AI-generated summaries. At the same time, AI summaries further compound the citation biases: Wikipedia and longer sources are disproportionately overrepresented, whereas cited social media content and negatively framed sources are substantially underrepresented. Our findings highlight the potential for extit{answer bubbles}, in which identical queries yield structurally different information realities across systems, with implications for user trust, source visibility, and the transparency of AI-mediated information access.

연구 동기 및 목표

생성적 검색 시스템과 전통적 검색 시스템이 인용된 소스, 도메인 구성, 주제 범위에서 어떻게 다른지 조사한다.
시스템 간 AI-생성 요약의 언어적 및 인식론적 속성을 특징짓는다.
AI 요약이 인용된 소스의 정보를 얼마나 충실하게 나타내는지 평가한다.
정보 노출에 영향을 주는 소스 선택 및 합성의 편향을 정량화한다.

제안 방법

네 가지 시스템(vanilla GPT, Search GPT, Google AI Overview, Google Search)을 사용하여 11개 주제에 걸친 11,000건의 실제 사용자 검색을 쿼리한다.
도메인 수준 분석(상위 100개 도메인, Jaccard를 통한 중복/겹침)을 사용하여 소스, 주제 분포 및 인용 패턴에 주석을 다는다.
LIWC 범주, 말투 체계, 가독성, 변환기 기반 정중도/형식성/극성/주관성 점수를 포함한 언어적 및 인식론적 특성을 계산한다.
응답을 Atomic Content Units(ACU)로 분해하고 RoBERTa-large MNLI 미세 조정 모델로 함의 확률을 평가하여 소스-ACU 충실도를 측정한다.
생성 요약에서 소스가 얼마나 고르게 그리고 공정하게 표현되는지 양적하기 위해 Equal Coverage(EC) 및 Coverage Parity(CP) 지표를 적용한다.
부트스트랩 재샘플링과 비모수 검정(Mann-Whitney, Benjamini-Hochberg)을 사용하여 유의성을 평가한다.

Figure 1: Paper Overview. Traditional search returns a ranked list of links for users to evaluate, while generative search produces an answer bubble containing AI-generated summaries synthesized from multiple sources. We study these answer bubbles along three dimensions: the sources they cite (RQ1),

실험 결과

연구 질문

RQ1RQ1: 생성적 검색 시스템이 인용하는 소스는 다양성, 집중도 및 도메인 구성에서 전통적 검색과 어떻게 다르습니까?
RQ2RQ2: 시스템 및 주제에 따라 생성적 응답의 인식론적, 심리언어학적 및 양식적 특성은 어떻게 달라집니까?
RQ3RQ3: 생성적 검색 요약이 인용된 소스의 정보를 얼마나 충실하게 나타냅니까?

주요 결과

생성 시스템은 구별되는 소스 풀을 사용한다; 예를 들어 Search GPT는 상위 100개 도메인에서 전통적인 Google Search와의 중복이 24–25%에 불과하고, 위키피디아는 시스템 전반에 걸쳐 인용을 지배한다.
검색 기반화는 최대 60%까지 애매한 표현(hedging)을 줄이면서도 신뢰 신호를 보존하며 주제에 따라 변동이 있다.
위키피디아가 가장 많이 인용되고 과대표현되는 도메인이며, 소셜 미디어 콘텐츠는 AI-생성 요약에서 최대 22퍼센트 포인트만큼 과소대표된다.
더 긴 소스가 과대표되며, 짧은 소스는 과소대표되어 소스 사용에 길이 편향이 있음을 시사한다.
인과적 언어와 확신 신호가 있는 인용 가능 소스가 요약에서 과도하게 다뤄지는 반면, 주관적 소스는 덜 다뤄져 단정적이고 설명적인 산문에 편향이 드러난다.

Figure 2: Top-15 cited domains by query topic for each source (% of queries citing each domain). Cell values $\geq$ 1% are shown. Domain preferences are strongly topic-dependent: IMDB dominates entertainment, ESPN dominates sports, and Spotify/Genius dominate music, but only in Google’s systems and

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.