QUICK REVIEW

[논문 리뷰] LLM-based Human Simulations Have Not Yet Been Reliable

Qian Wang, Jiaying Wu|ArXiv.org|2025. 01. 15.

Engineering Technology and Methodologies인용 수 3

한 줄 요약

본 논문은 신뢰할 수 있는 LLM 기반 인간 시뮬레이션을 위해서는 내부적 LLM 한계와 시뮬레이션 프레임워크 설계의 결함을 모두 해결해야 한다고 주장하며, 통합 프레임워크, 구체적 해결책, 향후 방향을 제시한다.

ABSTRACT

Large Language Models (LLMs) are increasingly employed for simulating human behaviors across diverse domains. However, our position is that current LLM-based human simulations remain insufficiently reliable, as evidenced by significant discrepancies between their outcomes and authentic human actions. Our investigation begins with a systematic review of LLM-based human simulations in social, economic, policy, and psychological contexts, identifying their common frameworks, recent advances, and persistent limitations. This review reveals that such discrepancies primarily stem from inherent limitations of LLMs and flaws in simulation design, both of which are examined in detail. Building on these insights, we propose a systematic solution framework that emphasizes enriching data foundations, advancing LLM capabilities, and ensuring robust simulation design to enhance reliability. Finally, we introduce a structured algorithm that operationalizes the proposed framework, aiming to guide credible and human-aligned LLM-based simulations. To facilitate further research, we provide a curated list of related literature and resources at https://github.com/Persdre/awesome-llm-human-simulation.

연구 동기 및 목표

LLM 한계와 시뮬레이션 설계상의 문제에서 비롯된 LLM 기반 인간 시뮬레이션의 근본적 도전 과제를 식별한다.
LLM 행동과 인간 참여를 명확히 정의하는 통합 시뮬레이션 프레임워크를 제안한다.
신뢰할 수 있는 시뮬레이션을 위한 데이터, 검증 및 평가를 개선하기 위한 구체적 해결책을 제시한다.
데이터 수집, 데이터 합성, 품질 관리용 LLM-판사 역할을 중심으로 미래 방향을 제시한다.

제안 방법

환경, 에이전트, 규칙으로 일반 프레임워크를 형식화한다 (Algorithm 1).
기존 시뮬레이션을 사회, 경제, 정책, 심리 도메인으로 분류하고 LLM의 행동과 인간 참여를 분석한다.
편향, 인지 일관성, 기억, 상호 작용 메커니즘 등 내재적 LLM 한계를 체계적으로 분석한다.
과도하게 단순화된 심리, 검증의 격차, 인센티브 모델링 등 시뮬레이션 프레임워크의 설계 결함을 체계적으로 분석한다.
LLM 한계와 프레임워크 설계 모두를 다루는 포괄적 해결책을 제시한다(섹션 5).
다차원적 인간 데이터 수집과 LLM 기반 데이터 품질 평가를 포함한 향후 방향을 제시한다.

Figure 1: LLM-based Human Simulation Applications

실험 결과

연구 질문

RQ1합당한 인간 시뮬레이션을 방해하는 LLM의 주요 내재적 한계는 무엇인가?
RQ2현 시뮬레이션 프레임워크의 설계 결함은 LLM 기반 시뮬레이션의 신뢰성과 타당성을 어떻게 저하시킬까?
RQ3신뢰성, 검증, 평가를 개선하기 위해 LLM 한계와 프레임워크 설계를 어떻게 함께 다룰 수 있을까?
RQ4LLM 기반 인간 시뮬레이션의 품질과 신뢰성을 높일 수 있는 향후 방향과 데이터 전략은 무엇인가?

주요 결과

LLM 기반 인간 시뮬레이션은 편향, 불일치하는 인지, 기억/장기 일관성 문제, 다중 에이전트 상호 작용 처리의 약점으로 고통받는다.
현 시뮬레이션 프레임워크는 복잡한 인간 상태를 과도하게 단순화하고 실시간 검증, 모니터링, 전문가 지식의 통합에 어려움을 겪는다.
LLM의 행동을 인간 참여와 분리하고 체계적인 검증을 안내하기 위한 통합 프레임워크가 제안된다.
편향 완화 학습, 인지 일관성 개선, LLM용 외부 기억, 모듈식 검증, 향상된 인센티브 모델링 등 구체적 해결책을 포함한다.
향후 방향은 더 풍부한 다중 모달리티의 인간 데이터, 고품질 합성 데이터, 데이터 품질을 위한 판사로서의 LLM을 강조한다.

Figure 2: Solutions for Reliable LLM-based Human Simulation

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.