QUICK REVIEW

[论文解读] Defect Category Prediction Based on Multi-Source Domain Adaptation

Ying Xing, Mengci Zhao|arXiv (Cornell University)|May 16, 2024

Industrial Vision Systems and Defect Detection被引用 1

一句话总结

本文提出 COPILOT，一种多源域自适应框架，通过整合对抗性训练和加权最大均值差异（WMMD）注意力机制，提升缺陷类别预测性能。通过将多个源项目建模为不同领域，并将其特征分布与目标项目对齐，COPILOT 在一个包含 8 个开源项目的公开数据集上实现了最先进性能，在多种缺陷类型和数据稀疏场景下，F1、MCC 和 Kappa 评分均显著优于现有方法。

ABSTRACT

In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categories. Consequently, this undermines the efficiency of developers in locating and rectifying defects. Furthermore, in practical software development, new projects often lack sufficient defect data to train high-accuracy deep learning models. Models trained on historical data from existing projects frequently struggle to achieve satisfactory generalization performance on new projects. Hence, this paper initially reformulates the traditional binary defect prediction task into a multi-label classification problem, employing defect categories described in the Common Weakness Enumeration (CWE) as fine-grained predictive labels. To enhance the model performance in cross-project scenarios, this paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms. Specifically, the proposed framework employs adversarial training to mitigate domain (i.e., software projects) discrepancies, and further utilizes domain-invariant features to capture feature correlations between each source domain and the target domain. Simultaneously, the proposed framework employs a weighted maximum mean discrepancy as an attention mechanism to minimize the representation distance between source and target domain features, facilitating model in learning more domain-independent features. The experiments on 8 real-world open-source projects show that the proposed approach achieves significant performance improvements compared to state-of-the-art baselines.

研究动机与目标

为解决传统二元缺陷预测的局限性，将之重新构建成使用 CWE 缺陷类别作为细粒度标签的多标签分类任务。
在目标项目缺乏足够标注缺陷数据的场景下，提升跨项目缺陷类别预测性能。
通过利用多个异构软件项目的知识，缓解源项目与目标项目之间的领域偏移。
通过学习领域不变特征表示和自适应注意力加权，提升模型泛化能力并减少负迁移。

提出的方法

将传统二元缺陷预测重新构建成基于 CWE 类别的多标签分类问题。
提出一种多源域自适应框架，利用对抗性训练减少源项目与目标项目之间的领域差异。
引入加权最大均值差异（WMMD）机制作为注意力模块，以最小化源域与目标域特征表示之间的距离。
利用对抗性训练获得的领域相关性得分，对不同源域的贡献进行加权，实现自适应特征对齐。
使用共享编码器和特定任务的分类头训练深层神经网络，联合优化领域对齐与缺陷类别预测。
采用两阶段训练流程：首先进行对抗性领域自适应，随后通过注意力驱动的特征优化进行端到端微调。

实验结果

研究问题

RQ1与最先进基线相比，所提出的 COPILOT 框架在跨项目设置下是否能显著提升缺陷类别预测性能？
RQ2COPILOT 在处理多种缺陷类型（包括输入验证、缓冲区溢出等罕见或复杂类别）方面的有效性如何？
RQ3数据稀缺对 COPILOT 性能有何影响？在低数据场景下，其表现与基线相比如何？
RQ4对抗性训练与 WMMD 注意力机制的集成在多大程度上提升了模型的鲁棒性与泛化能力？

主要发现

在六个 CWE 缺陷类型类别上，COPILOT 的平均 F1 得分为 0.932，比最佳基线 ABMSDA 提高 36.4%。
在严重缺陷的 w_F1 指标上，COPILOT 的平均得分为 0.877，较 ABMSDA 提高 44.9%，较 μVulDeePecker 提高 23.2%。
在 RQ2 消融研究中，若移除对抗性训练或 WMMD 注意力机制，平均 Kappa 值分别下降至 0.935 和 0.927，证实两个组件均至关重要。
COPILOT 在所有缺陷数据量级别下均保持优越性能，当缺陷类别样本超过 36 个时稳定性最高。
Scott-Knott ESD 检验确认，COPILOT 在所有评估指标（Acc、MCC、Kappa、F1、w_F1）中排名第一，且在大多数比较中效应量较大（Cohen’s d > 1.0）。
该模型展现出强大的泛化能力，在数据集中全部八个目标项目（包括 Apache JMeter、Elasticsearch 和 JTree）上均取得最佳性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。