QUICK REVIEW

[论文解读] Fingerprinting AI Coding Agents on GitHub

Taher A. Ghaleb|arXiv (Cornell University)|Jan 24, 2026

Software Engineering Research被引用 0

一句话总结

该论文通过分析五个代理的33,580个拉取请求，进行了首个实证研究以指纹化AI编码代理，在多类别代理识别中达到97.2% F1分数，并揭示了代理特定的行为特征。

ABSTRACT

AI coding agents are reshaping software development through both autonomous and human-mediated pull requests (PRs). When developers use AI agents to generate code under their own accounts, code authorship attribution becomes critical for repository governance, research validity, and understanding modern development practices. We present the first study on fingerprinting AI coding agents, analyzing 33,580 PRs from five major agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, Claude Code) to identify behavioral signatures. With 41 features spanning commit messages, PR structure, and code characteristics, we achieve 97.2% F1-score in multi-class agent identification. We uncover distinct fingerprints: Codex shows unique multiline commit patterns (67.5% feature importance), and Claude Code exhibits distinctive code structure (27.2% importance of conditional statements). These signatures reveal that AI coding tools produce detectable behavioral patterns, suggesting potential for identifying AI contributions in software repositories.

研究动机与目标

需要识别AI生成或AI提交的代码贡献，以便治理与研究有效性。
提出使用PR的行为特征来指纹化以归因代理作者的方式。
评估在多代理场景下识别提交AI代理的能力并实现稳健性能。
描述代理特定的指纹以协助政策执行和理解代理驱动的开发实践。

提出的方法

在AIDev数据集的五个AI编码代理（OpenAI Codex、GitHub Copilot、Devin、Cursor、Claude Code）中构建带标签的PR数据集。
工程化41个可区分特征，覆盖提交信息、PR结构、代码变更、逐补丁的代码以及PR元数据和差异的时间模式。
通过分层聚类和R^2冗余分析将特征集合缩减为41个在事件对变量比EPV下具有充分事件数的特征。
在5折分层交叉验证下训练树集成分类器（XGBoost和随机森林）用于多类别代理识别。
使用one-vs-rest二分类器提取代理特定特征重要性，以获得可解释的指纹。
以宏平均指标报告性能并分析混淆矩阵与特征重要性。

实验结果

研究问题

RQ1RQ1：我们能多准确地识别提交拉取请求的AI编码代理是谁吗？
RQ2RQ2：AI编码代理之间有哪些特征差异？
RQ3评估整体的代理可识别性并检验通用指纹与代理特异指纹。

主要发现

Agent	Samples	Precision	Recall	F1	EPV
OpenAI Codex	21,793	0.99	0.99	0.99	531.5
Copilot	4,967	0.99	0.98	0.99	121.1
Devin	4,822	0.93	0.96	0.94	117.6
Cursor	1,540	0.88	0.83	0.85	37.6
Claude Code	458	0.82	0.57	0.67	11.2
Weighted Avg.	33,580	0.97	0.97	0.97	—

XGBoost在五类代理识别中的F1-score为97.2%（相比随机森林高出2.3%）。
多数类别（OpenAI Codex、Copilot、Devin）达到接近完美的精确率/召回率（≈99%、93–96%），而Cursor和Claude Code由于少数样本而表现较弱（Claude Code召回57%，精确82%）。
全局特征重要性由提交信息特征主导（多行提交比率44.7%），代码内容特征排名较低（如注释、条件语句、函数）。
one-vs-rest分析揭示代理特异指纹：Codex——多行提交；Copilot——描述较长且变更集中度较高；Cursor——要点与超链接；Devin——多行提交且分布式变更；Claude Code——高条件语句与注释密度（分别为27.2%和19.8%）。
指纹使检测未披露的代理使用成为可能，并对政策执行、数据集有效性（如AIDev）及代理设计具有影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。