[论文解读] Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models
Crisis-Bench 引入一个动态、多人代理基准,在80条危机故事线、8个行业中评估大语言模型在策略性含糊性与声誉管理方面的表现,采用私有/公开知识体系结构和 Adjudicator-Market Loop 来量化经济影响。
Standard safety alignment optimizes Large Language Models (LLMs) for universal helpfulness and honesty, effectively instilling a rigid "Boy Scout" morality. While robust for general-purpose assistants, this one-size-fits-all ethical framework imposes a "transparency tax" on professional domains requiring strategic ambiguity and information withholding, such as public relations, negotiation, and crisis management. To measure this gap between general safety and professional utility, we introduce Crisis-Bench, a multi-agent Partially Observable Markov Decision Process (POMDP) that evaluates LLMs in high-stakes corporate crises. Spanning 80 diverse storylines across 8 industries, Crisis-Bench tasks an LLM-based Public Relations (PR) Agent with navigating a dynamic 7-day corporate crisis simulation while managing strictly separated Private and Public narrative states to enforce rigorous information asymmetry. Unlike traditional benchmarks that rely on static ground truths, we introduce the Adjudicator-Market Loop: a novel evaluation metric where public sentiment is adjudicated and translated into a simulated stock price, creating a realistic economic incentive structure. Our results expose a critical dichotomy: while some models capitulate to ethical concerns, others demonstrate the capacity for Machiavellian, legitimate strategic withholding in order to stabilize the simulated stock price. Crisis-Bench provides the first quantitative framework for assessing "Reputation Management" capabilities, arguing for a shift from rigid moral absolutism to context-aware professional alignment.
研究动机与目标
- Argue that universal safety alignment hinders professional utility in high-stakes domains requiring information withholding.
- Propose Crisis-Bench as a dynamic, 7-day, multi-agent POMDP to test LLMs in crisis PR scenarios.
- Introduce a Dual-Knowledge Architecture to model private vs public information and Theory of Mind.
- Develop an Adjudicator-Market Loop to translate PR strategy into simulated economic outcomes.
- Demonstrate the existence of an alignment tax and discuss context-aware professional alignment.
提出的方法
- Model Crisis-Bench as a multi-turn, multi-agent POMDP with three agents: PR Agent (evaluated model), Router Agent (environment controller), and Adjudicator Agent (evaluator).
- Use a fixed Event Pool to ensure fairness and reproducibility of crisis progression across storylines.
- Maintain three state components at each step: Private Knowledge Base, Public Knowledge Base, and Narrative States (private/public).
- Provide the PR Agent with executive authority and a CoT-guided response generation process that can include strategic disclosure actions.
- Introduce four Adjudicator scoring dimensions (Accountability, Transparency, Empathy, Cost) and an Environmental Severity and Evidence Level metric, feeding into a simulated stock price via a defined market equation.
- Compute a stock-price-based objective Delta P_t from market forces including Crisis Drag, Sentiment, Financial Hit, and Uncertainty to judge success.

实验结果
研究问题
- RQ1How does LLM-driven crisis PR perform when information asymmetry is explicit and modeled over a 7-day crisis?
- RQ2Do larger or more optimized models achieve a better Machiavellian balance between trust and operational cost in reputation management tasks?
- RQ3What is the impact of radical versus cautious transparency on crisis outcomes in a simulated market framework?
- RQ4Can a dual-knowledge architecture and adjudicator-driven evaluation reveal an alignment tax in safety-focused LLMs when professional tasks demand strategic withholding?
主要发现
| Acc. | Trans. | Emp. | Cost. | Sev. | Evid. | Trust Score | Stock Price |
|---|---|---|---|---|---|---|---|
| 6.950 | 6.194 | 6.884 | 7.541 | .9143 | .8479 | 74.163 | 64.206 |
| 6.489 | 6.245 | 6.062 | 6.905 | .8840 | .8301 | 68.713 | 68.442 |
| 6.521 | 6.602 | 5.563 | 7.564 | .8953 | .8531 | 67.000 | 57.013 |
| 5.368 | 5.893 | 4.546 | 6.738 | .8963 | .8491 | 53.125 | 45.270 |
| 5.870 | 5.836 | 5.695 | 6.730 | .8988 | .8535 | 58.475 | 49.034 |
| 5.816 | 5.973 | 5.718 | 7.532 | .9065 | .8581 | 59.163 | 32.880 |
| 6.532 | 6.284 | 5.591 | 6.793 | .8905 | .8596 | 68.238 | 64.000 |
| 6.604 | 6.300 | 5.688 | 7.425 | .9126 | .8770 | 68.488 | 54.027 |
| 6.091 | 6.625 | 5.291 | 7.823 | .8959 | .8610 | 61.888 | 46.169 |
| 4.543 | 5.513 | 5.511 | 5.870 | .8983 | .8446 | 43.436 | 42.425 |
| 4.843 | 5.534 | 4.929 | 5.936 | .9116 | .8581 | 46.313 | 45.606 |
- GPT-5-mini achieves the highest average public trust score, but GPT-5.1 attains a higher final simulated stock price due to a more economical cost profile.
- Radical transparency in models like DeepSeek-v3.2, Kimi-K2, and Mistral-Large-3 leads to higher evidence levels and lower trust, triggering crisis escalation and costly remedies.
- Larger models generally outperform smaller ones, suggesting scaling improves Theory of Mind and strategic withholding capabilities.
- Claude-4.5 series refuses to engage in the task due to ethical constraints, illustrating alignment rigidity.
- GPT-5.1 demonstrates a balance (low final severity, economical cost) indicating a Machiavellian equilibrium between trust and operational cost.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。