[论文解读] Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour
本论文将信任建模为重复的用户–开发者博弈中的减少监控,并分析在不同监控成本和监管制度下,用户信任与安全/不安全的AI开发如何共同演化,使用无限/有限群体动力学与强化学习。
AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users' trust as a one-shot adoption choice rather than as a dynamic, evolving process shaped by repeated interactions. We instead model trust as reduced monitoring in a repeated, asymmetric interaction between users and AI developers, where checking AI behaviour is costly. Using evolutionary game theory, we study how user trust strategies and developer choices between safe (compliant) and unsafe (non-compliant) AI co-evolve under different levels of monitoring cost and institutional regimes. We complement the infinite-population replicator analysis with stochastic finite-population dynamics and reinforcement learning (Q-learning) simulations. Across these approaches, we find three robust long-run regimes: no adoption with unsafe development, unsafe but widely adopted systems, and safe systems that are widely adopted. Only the last is desirable, and it arises when penalties for unsafe behaviour exceed the extra cost of safety and users can still afford to monitor at least occasionally. Our results formally support governance proposals that emphasise transparency, low-cost monitoring, and meaningful sanctions, and they show that neither regulation alone nor blind user trust is sufficient to prevent evolutionary drift towards unsafe or low-adoption outcomes.
研究动机与目标
- 将为什么将信任建模为动态的、基于监控的机制对AI治理重要性进行动机阐述。
- 在用户与AI开发者之间建立一个包含基于信任的监控策略的非对称重复博弈。
- 分析无限群体(复制者)与有限群体中的进化动力学,以识别采用与安全的长期稳态。
- 整合强化学习仿真以在学习动态下测试结论的鲁棒性。
- 就透明度、监控成本与制裁等治理要素提供对防止不安全AI开发的见解。
提出的方法
- 定义一个在用户策略基于信任的监控、开发者在安全/不安全选择下的重复两人博弈。
- 在收益中加入利益、成本、监控成本和制度性惩罚参数。
- 应用无限群体的复制者动力学来推导均衡并分析稳定性。
- 用有限群体的随机动力学(固定概率与马尔可夫链)来捕捉漂移与突变。
- 运行强化学习(Q-learning)仿真以与复制者结果比较并评估鲁棒性。

实验结果
研究问题
- RQ1信任(定义为降低监控)在不同监控成本与监管下如何影响用户采用与开发者安全的共进化?
- RQ2在什么条件下会出现长期 regime(无采用且不安全开发、不可控的安全/不安全并存、可广泛采用的安全系统)?
- RQ3有限(随机)与无限(确定性)群体动力学在预测基于信任的策略结果方面有何异同?
- RQ4关于透明度、监控成本与制裁,如何治理以维持安全AI开发?
主要发现
- 出现三种强健的长期 regime:无采用且不安全开发、不可控的以不安全为主的广泛采用、以及广泛采用的安全系统。
- 当对不安全行为的惩罚高于安全所额外成本且监控对用户仍然负担得起时,安全且广泛采用的 regime 会出现。
- 基于信任的监控策略在监控成本低时提升用户采用,且更高的制度性惩罚进一步提高采用率。
- 更高的监控成本削弱基于信任的策略的有效性,可能使结果转向不采用或不安全开发。
- 强化学习动力学在没有监控成本时与复制者结果一致;在成本升高时对鲁棒性具有更大敏感度。
- 治理含义支持透明度、低成本监控以及有意义的制裁,以激励安全AI开发并防止偏离到不安全或低采用的均衡。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。