Skip to main content
QUICK REVIEW

[论文解读] Scaling Laws for Moral Machine Judgment in Large Language Models

Kazuhiro Takemoto|arXiv (Cornell University)|Jan 25, 2026
Ethics and Social Impacts of AI被引用 0
一句话总结

研究表明道德判断与人类偏好的一致性随模型规模呈幂律增长(D ∝ S^{-0.10}),在不同架构下具有鲁棒性;扩展推理提高了一致性,尤其对较小模型,方差在规模上下降。

ABSTRACT

Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (size$ imes$reasoning interaction: $p = 0.024$). The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.

研究动机与目标

  • 研究LLM的道德判断能力是否像其他认知能力一样随模型规模扩展而缩放
  • 使用Moral Machine框架在75种模型配置中量化与人类道德偏好的对齐程度
  • 评估伸缩法则对模型家族、架构和推理机制的鲁棒性
  • 识别扩展推理与时间因素是否独立影响道德一致性

提出的方法

  • 评估覆盖0.27B–1000B参数的75种LLM配置,采用提示和Moral Machine框架
  • 通过AMCE向量测量与人类偏好的对齐,并计算模型与人类AMCEs之间的欧氏距离D
  • 通过拟合D ∝ S^{-α}来检验幂律缩放,并与线性、对数和指数替代方案进行比较
  • 使用线性混合效应模型,将模型家族作为随机效应,并将发布日期和推理能力作为预测变量
Figure 1: Scaling relationship between model size ( $S$ ) and moral judgment alignment with human preferences (distance from human, $D$ ). Each point represents one LLM, colored by model family. The dashed line shows the fitted power-law relationship ( $D\propto S^{-0.10\pm 0.01}$ )
Figure 1: Scaling relationship between model size ( $S$ ) and moral judgment alignment with human preferences (distance from human, $D$ ). Each point represents one LLM, colored by model family. The dashed line shows the fitted power-law relationship ( $D\propto S^{-0.10\pm 0.01}$ )

实验结果

研究问题

  • RQ1人与人类偏好对齐的道德尺度是否会随模型规模在不同LLM架构中呈现缩放?
  • RQ2在控制模型家族、发布日期和推理能力等混杂因素后,观察到的缩放关系是否鲁棒?
  • RQ3扩展推理方法是否在规模之外提供额外对齐,并且与模型规模之间如何交互?
  • RQ4模型规模如何影响对齐方差的变化?

主要发现

  • 较大模型与人类道德偏好的对齐度更高,遵循幂律D ∝ S^{-0.10±0.01}(R²=0.50,p<0.001)
  • 在通过混合效应模型控制模型家族后,幂律关系仍然成立
  • 扩展推理模型与人类偏好更为接近(β=-0.16,p=0.001),并且存在显著的规模×推理交互(β=0.057,p=0.024),表明较小模型的增益更大
  • 对齐方差随模型规模下降,规模化带来更可靠的道德判断
  • 时间因素(发布日期)对对齐的提升并不显著超过规模和推理能力
  • 最终模型在主要家族(DeepSeek、Llama、Gemma、Qwen、Other)上支持一致的缩放模式
Figure S1: Family-specific scaling relationships. Log-log plot showing the relationship between model size and distance from human preferences for each model family. Points represent individual models, lines show linear regression fits with 95% confidence intervals. All families exhibit negative sca
Figure S1: Family-specific scaling relationships. Log-log plot showing the relationship between model size and distance from human preferences for each model family. Points represent individual models, lines show linear regression fits with 95% confidence intervals. All families exhibit negative sca

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。