QUICK REVIEW

[论文解读] Identifying Adversary Characteristics from an Observed Attack

Soyon Michelle Choi, Scott Alfeld|arXiv (Cornell University)|Mar 5, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

论文提出一个与领域无关的框架，用于从对 ML 系统的数据操纵攻击中反向推断攻击者参数，证明攻击者通常不可识别，并展示基于概率推断来识别最可能的攻击者参数，并进行了经验验证。

ABSTRACT

When used in automated decision-making systems, machine learning (ML) models are vulnerable to data-manipulation attacks. Some defense mechanisms (e.g., adversarial regularization) directly affect the ML models while others (e.g., anomaly detection) act within the broader system. In this paper we consider a different task for defending the adversary, focusing on the attacker, rather than the attack. We present and demonstrate a framework for identifying characteristics about the attacker from an observed attack. We prove that, without additional knowledge, the attacker is non-identifiable (multiple potential attackers would perform the same observed attack). To address this challenge, we propose a domain-agnostic framework to identify the most probable attacker. This framework aids the defender in two ways. First, knowledge about the attacker can be leveraged for exogenous mitigation (i.e., addressing the vulnerability by altering the decision-making system outside the learning algorithm and/or limiting the attacker's capability). Second, when implementing defense methods that directly affect the learning process (e.g., adversarial regularization), knowledge of the specific attacker improves performance. We present the details of our framework and illustrate its applicability through specific instantiations on a variety of learners.

研究动机与目标

Motivate understanding of the attacker beyond fixed threat models to inform risk assessment and robust design.
Develop a domain-agnostic reverse-engineering framework to infer attacker knowledge, capability, and objective from observed attacks.
Show that attacker characteristics are non-identifiable in general and propose probabilistic inference to identify the most probable attacker parameters.
Demonstrate feasibility and utility of the framework across several learner types (linear, logistic, and MLP).

提出的方法

Formulate defender’s task as reverse optimization to infer attacker parameters (K, C, O) from observed attack alpha_obs.
Prove non-identifiability: for linear DFDR and ATKR, any attack can be produced by multiple attackers (Theorem 3.2).
Introduce a probabilistic framework with priors p(K,C,O) and likelihood p(alpha_obs | alpha_opt(K,C,O)); include a tuning parameter lambda to balance prior and data.
Instantiate three attacker-defender configurations (linear regression with Mahalanobis constraints, logistic regression with box constraints, MLP with box constraints).
Derive quadratic-form objective reductions in the linear case (Lemmas 3.3) and outline surrogates for non-linear cases (Lemmas 3.4, 3. for NN).
Evaluate via bi-level optimization using projected gradient methods with synthetic and real datasets; compare against prior-mode baselines.

Figure 1: Schematic overview of our framework within the overall attacker-defender system. In this paper, we consider the example cases where $f$ is linear regression, logistic regression, or a multi-layer perceptron.

实验结果

研究问题

RQ1Can the attacker parameters (K, C, O) be uniquely identified from an observed attack alpha_obs?
RQ2How can a defender reliably infer the most probable attacker parameters when, in general, identifiability does not hold?
RQ3Does incorporating priors improve the accuracy of attacker parameter inference across linear, logistic, and neural settings?
RQ4What is the impact of attacker optimality and nonlinearity on the stability of the inference framework?

主要发现

Defender Type	Attacker Type	Med	Max	% trials PER>0
Linear Regression	Repulsive	99.14	99.65	91
Logistic Regression	Attractive	13.35	84.56	66
Multi-layer Perceptron	Attractive	25.25	71.68	84

Attacker characteristics are non-identifiable: Theorem 3.2 shows multiple ATKR parameterizations can yield the same observed attack for linear DFDR.
The probabilistic framework with priors enables the defender to recover attacker parameters more accurately than using priors alone (positive PER in experiments).
Parameterization 1 (Linear Regression) achieved a median PER of 99.14% and max 99.65% across trials.
Parameterizations 2 (Logistic Regression) and 3 (MLP) also show substantial improvements, with max PER of 84.56% and 71.68% respectively but higher variance.
Experiments indicate higher variance for non-linear models, highlighting the need for strong priors and noting attacker suboptimality can affect identifiability.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。