[论文解读] Game-Theoretic Modeling of Stealthy Intrusion Defense against MDP-Based Attackers
论文通过在防守部署下将攻击者进展建模为马尔可夫决策过程(MDP),扩展 Cut-The-Rope 框架,并分析三种攻击者信息景气(Stackelberg、概率信念和盲目)以推导最优防守策略。
The rapid expansion of Internet use has increased system exposure to cyber threats, with advanced persistent threats (APTs) being especially challenging due to their stealth, prolonged duration, and multi-stage attacks targeting high-value assets. In this study, we model APT evolution as a strategic interaction between an attacker and a defender on an attack graph. With limited information about the attacker's position and progress, the defender acts at random intervals by deploying intrusion detection sensors across the network. Once a compromise is detected, affected components are immediately secured through measures such as backdoor removal, patching, or system reconfiguration. Meanwhile, the attacker begins with reconnaissance and then proceeds through the network, exploiting vulnerabilities and installing backdoors to maintain persistent access and adaptive movement. Furthermore, the attacker may take several steps between consecutive defensive operations, resulting in an asymmetric temporal dynamic. The defender's goal is to reduce the likelihood that the attacker will gain access to a critical asset, whereas the attacker's purpose is to increase this likelihood. We investigate this interaction under three informational regimes, reflecting varying levels of attacker knowledge prior to action: (i) a Stackelberg scenario, in which the attacker has full knowledge of the defender's strategy and can optimize accordingly; (ii) a blind regime, where the attacker has no information and assumes uniform beliefs about defensive deployments; and (iii) a belief-based framework, where the attacker holds accurate probabilistic beliefs about the defender's actions. For each regime, we derive optimal defensive strategies by solving the corresponding optimization problems.
研究动机与目标
- 为对在攻击图上分阶段推进的隐形 APT 进行主动防御提供动机。
- 将攻击者演化建模为受防守部署影响的状态依赖路由决策的 MDP。
- 研究三种攻击者信息景气,以推导将攻击成功概率降至最小的防守策略。
- 提供在资源约束下求解防守者–攻击者优化问题的计算方法。
提出的方法
- 将交互建模为在有向无环攻击图上的双人零和博弈。
- 通过让在受攻节点上的攻击者决策受MDP支配,扩展 CTR 框架。
- 定义三种信息景气:Stackelberg(全信息)、概率推断和盲目(侦察干扰)。
- 将防守策略表述为 A1 = V \ F 上的纯 h 段部署,具备资源约束;将攻击者视为 over A2(攻击路径)的路径分布。
- 使用修正后的转移函数 P^x 来捕捉在受保护节点被穿越时的检测,并通过线性规划(基于 Bellman 的约束)计算攻击者的价值。
- 将防守方的双层问题转化为带辅助变量的 MILP,以线性化非线性项(Big-M 形式)。
实验结果
研究问题
- RQ1防守者探测器放置如何在不同信息景气下影响攻击者基于MDP的最优路由决策?
- RQ2在 Stackelberg、概率信念和盲目信息设定下,最优的防守策略是什么?
- RQ3防守方如何在有限传感器下分配以最小化攻击者抵达关键资产的概率?
- RQ4在资源约束下,哪些计算方法(线性规划、MILP、蒙特卡洛)能有效求解攻击者的 MDP 与防守方优化?
主要发现
- 优化的探测器部署在显著降低攻击者成功概率方面优于基线启发式方法。
- 将攻击者进展建模为具备侦察感知转移的 MDP,捕捉不对称的时序动态并改进防御规划。
- Stackelberg 景气下在攻击者信息充足时防守方处于不利地位,通过所给的 MILP 公式进行了量化。
- Dirichlet 信号(基于信念的方法)可通过塑造攻击者对防守的信念来缓解 Stackelberg 的不利。
- 该框架在预算约束下利用 LP/MILP 技术实现防守部署的可行解。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。