QUICK REVIEW

[논문 리뷰] The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

Sheza Munir, Benjamin Mah|arXiv (Cornell University)|2026. 02. 11.

Ethics and Social Impacts of AI인용 수 0

한 줄 요약

이 논문은 데이터 주석 관행이 제조된 단일 진실을 어떻게 만들어내는지 분석하고, 다원적이고 인식론적으로 공정한 주석 인프라를 옹호한다.

ABSTRACT

In machine learning, "ground truth" refers to the assumed correct labels used to train and evaluate models. However, the foundational "ground truth" paradigm rests on a positivistic fallacy that treats human disagreement as technical noise rather than a vital sociotechnical signal. This systematic literature review analyzes research published between 2020 and 2025 across seven premier venues: ACL, AIES, CHI, CSCW, EAAMO, FAccT, and NeurIPS, investigating the mechanisms in data annotation practices that facilitate this "consensus trap". Our identification phase captured 30,897 records, which were refined via a tiered keyword filtration schema to a high-recall corpus of 3,042 records for manual screening, resulting in a final included corpus of 346 papers for qualitative synthesis. Our reflexive thematic analysis reveals that systemic failures in positional legibility, combined with the recent architectural shift toward human-as-verifier models, specifically the reliance on model-mediated annotations, introduce deep-seated anchoring bias and effectively remove human voices from the loop. We further demonstrate how geographic hegemony imposes Western norms as universal benchmarks, often enforced by the performative alignment of precarious data workers who prioritize requester compliance over honest subjectivity to avoid economic penalties. Critiquing the "noisy sensor" fallacy, where statistical models misdiagnose cultural pluralism as random error, we argue for reclaiming disagreement as a high-fidelity signal essential for building culturally competent models. To address these systemic tensions, we propose a roadmap for pluralistic annotation infrastructures that shift the objective from discovering a singular "right" answer to mapping the diversity of human experience.

연구 동기 및 목표

주요 ML/HCI 공간에서 데이터 주석에서 정의를 실현하는 데 필요한 제도적/구조적 장애를 평가한다.
주관적 경험을 지우고 서구 중심의 정답을 강요하는 주석 전후의 의사결정을 식별한다.
주석자 선택, 노동, 집계 관행이 하류 모델과 불평등에 어떻게 영향을 미치는지 도식화한다.
인식론적 정의를 중심에 둔 다원적 주석 인프라에 대한 로드맵을 제안한다.

제안 방법

ACL, AIES, CHI, CSCW, EAAMO, FAccT, NeurIPS에서 2020–2025년 사이의 346편의 논문에 대한 구조적 문헌고찰을 수행한다.
인간 혹은 기계 주석자 및 주석 과정이 포함된 연구를 선정하기 위해 PICOC 기준을 적용한다.
주석 관행의 사회기술적 긴장과 경향을 식별하기 위해 반성적 주제합성을 수행한다.
주관적 지식을 약화시키거나 보존하는 주석 전후 의사결정의 분류체계를 개발한다.
추출적 데이터 노동에서 구체적 지식 관리로의 전환을 위한 다주체 이해관계자 로드맹을 종합한다.

실험 결과

연구 질문

RQ1RQ1: 주석자 적합성은 어떻게 개념화되고 구현되며, 방법이 문화적 전문성이나 살아있는 경험을 어느 정도까지 반영하는가?
RQ2RQ2: 합의 및 라벨 집계는 어떻게 다뤄지며, 방법이 노이즈와 인식론적 다원성을 구분하는가?

주요 결과

정답은 주석 파이프라인의 구조적 설계 및 거버넌스 선택에 의해 제조된 사회기술적 산물이다.
주석자의 위치성, 노동 역학 및 서구 중심의 데이터 관행은 체계적으로 주관적 목소리와 다양한 관점을 지운다.
인간을 검증자로 삼는 모델로의 전환은 고정 편향을 야기하고 인간 입력을 실질적 반대 주장보다는 간헐적 검증으로 축소시킨다.
모델 매개 주석 및 합성 데이터 루프는 관점을 동질화하고 규범적 편향을 고착시킬 위험이 있다.
지리적 헤게모니와 인프라 필터는 서구의 규범을 보편적 정답으로 작동시키고 글로벌 사우스 맥락을 소외시킨다.
다원적 집계, 근거 인식 접근법, 심의적 주석은 불일치를 고충실한 신호로 보존하고 인식론적 정의를 지지할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.