QUICK REVIEW

[论文解读] Beyond Static Snapshots: Dynamic Modeling and Forecasting of Group-Level Value Evolution with Large Language Models

Qiankun Pi, Guixin Su|arXiv (Cornell University)|Feb 15, 2026

Computational and Text Analysis Methods被引用 0

一句话总结

该论文提出一个动态、事件感知的框架，利用大型语言模型基于历史轨迹和来自中国和美国的世界价值观调查（WVS）数据中的社会事件，来建模和预测群体层面的价值演变。

ABSTRACT

Social simulation is critical for mining complex social dynamics and supporting data-driven decision making. LLM-based methods have emerged as powerful tools for this task by leveraging human-like social questionnaire responses to model group behaviors. Existing LLM-based approaches predominantly focus on group-level values at discrete time points, treating them as static snapshots rather than dynamic processes. However, group-level values are not fixed but shaped by long-term social changes. Modeling their dynamics is thus crucial for accurate social evolution prediction--a key challenge in both data mining and social science. This problem remains underexplored due to limited longitudinal data, group heterogeneity, and intricate historical event impacts. To bridge this gap, we propose a novel framework for group-level dynamic social simulation by integrating historical value trajectories into LLM-based human response modeling. We select China and the U.S. as representative contexts, conducting stratified simulations across four core sociodemographic dimensions (gender, age, education, income). Using the World Values Survey, we construct a multi-wave, group-level longitudinal dataset to capture historical value evolution, and then propose the first event-based prediction method for this task, unifying social events, current value states, and group attributes into a single framework. Evaluations across five LLM families show substantial gains: a maximum 30.88\% improvement on seen questions and 33.97\% on unseen questions over the Vanilla baseline. We further find notable cross-group heterogeneity: U.S. groups are more volatile than Chinese groups, and younger groups in both countries are more sensitive to external changes. These findings advance LLM-based social simulation and provide new insights for social scientists to understand and predict social value changes.

研究动机与目标

弥合静态的基于LLM的社会仿真与动态、纵向价值演变之间的差距。
从世界价值观调查（WVS）中构建覆盖中国与美国的多波次、按群体分层的数据集。
开发一个两阶段框架：Value Trajectory Prediction（VTP）与Event-Aware Prediction（EAP），用于预测未来的群体价值。
通过将外部事件与价值维度对齐，实现可解释的预测。
分析跨国与人口统计异质性在价值动态中的表现。

提出的方法

从WVS波次5–7为中国与美国构建覆盖性强的四类人口统计（性别、年龄、教育、收入）的多波次、按群体分层的纵向数据集。
使用纵向提示对LLMs进行微调，使其以人口统计向量和前一波答案为条件，学习动态价值转变（VTP）。
引入事件感知扩展（EAP），从领域特定库中检索语义对齐的事件并推理它们对价值轨迹的影响。
将每个问卷条目映射到一个价值维度，在共享嵌入空间中编码价值和事件，通过余弦相似度执行价值驱动的事件匹配。
在已知和未知问题上对Vanilla、VTP与EAP进行评估，使用Exact Match（EM）和Proximity Score（PS），并给出一个综合指标。
分析消融实验以量化历史与事件组件对预测性能的贡献。

实验结果

研究问题

RQ1LLM驱动的社会仿真是否能使用历史轨迹捕捉群体层面价值的动态演化？
RQ2外部事件是否显著影响价值轨迹，事件感知建模是否能提升预测？
RQ3中国与美国及不同人口群体（性别、年龄、教育、收入）之间的动态有何差异？
RQ4在事件驱动的价值预测中，信息量与噪声之间的权衡应选择多少个事件？
RQ5在这一动态、跨文化场景中，开源模型经过微调能否达到或超过闭源模型？

主要发现

Model	Methods	China Seen EM	China Seen PS	China Seen Overall	China Unseen EM	China Unseen PS	China Unseen Overall	United States Seen EM	United States Seen PS	United States Seen Overall	United States Unseen EM	United States Unseen PS	United States Unseen Overall
Qwen3-8B	Vanilla	55.78	40.74	51.17	49.53	42.59	47.74	52.77	55.60	53.56	53.71	62.72	55.42
Qwen3-8B	VTP	68.28	61.26	66.13	75.31	77.14	75.78	62.50	75.01	65.99	63.17	76.74	65.76
Qwen3-8B	EAP	80.51	80.64	80.55	79.35	80.09	79.54	71.63	78.51	73.55	72.25	80.33	73.79
Qwen3-14B	Vanilla	64.81	48.43	59.80	70.96	63.66	69.08	48.48	62.27	52.33	57.67	70.87	60.19
Qwen3-14B	VTP	75.63	61.07	71.17	82.76	83.75	83.02	66.74	69.93	67.63	67.65	78.37	69.69
Qwen3-14B	EAP	82.72	79.71	81.80	84.16	81.16	83.39	77.12	79.33	77.74	70.72	76.96	71.90
GLM4-9B	Vanilla	53.68	51.60	53.04	52.95	71.52	57.74	51.74	60.28	54.12	47.31	63.15	50.33
GLM4-9B	VTP	60.71	70.64	63.75	84.94	82.86	84.40	74.62	79.80	76.07	71.87	74.89	72.44
GLM4-9B	EAP	77.10	83.36	79.02	87.42	85.18	86.84	79.18	82.97	80.24	76.21	81.63	77.25
Llama3.1-8B	Vanilla	48.16	42.62	46.47	49.07	36.34	45.78	51.68	55.54	52.76	46.93	42.93	46.17
Llama3.1-8B	VTP	53.94	51.17	53.09	75.31	80.18	76.57	77.77	80.31	78.48	59.08	79.02	62.88
Llama3.1-8B	EAP	78.68	74.36	77.35	79.35	80.89	79.75	77.45	81.77	78.65	70.72	81.63	77.25
Mistral3-7B	Vanilla	38.60	55.93	43.91	64.44	58.48	62.90	49.13	61.94	52.71	46.04	60.87	48.86
Mistral3-7B	VTP	59.24	62.29	60.17	70.96	73.84	71.71	67.50	71.87	68.72	59.08	70.76	61.30
Mistral3-7B	EAP	74.11	67.33	72.03	70.50	76.16	71.96	73.21	70.13	72.35	65.22	70.43	66.21

VTP，尤其是EAP在已知与未知问题上均优于Vanilla基线，覆盖中国与美国。
在中国群体中，使用Qwen3-14B的EAP达到最佳已知问题表现（相对于Vanilla提升+22.00%）。
在美国群体中，使用GLM4-9B的EAP达到最佳已知问题表现（相对于Vanilla提升+26.12%）。
经过微调的开源模型可以达到或超过该任务中的某些闭源模型。
总体而言，EAP在鲁棒性和跨文化泛化方面通常优于仅使用VTP的情况。
消融显示去除事件或历史组件都会降低性能，对美国群体的影响更大，表明对外部冲击更敏感。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。