QUICK REVIEW

[论文解读] Fast Static Analyses of Software Product Lines -- An Example With More Than 42,000 Metrics

Sascha El-Sharkawy, Adam Krafczyk|arXiv (Cornell University)|Oct 12, 2021

Advanced Software Engineering Methodologies参考文献 28被引用 5

一句话总结

本文提出一种新颖的局部解析技术，利用简化抽象语法树（RASTs）实现软件产品线（SPLs）的快速、可扩展静态分析，支持超过42,000种度量变体——包括传统的单系统度量与变体感知度量——在大型系统（如Linux内核）上实现每项度量子秒级的性能表现。

ABSTRACT

Context: Software metrics, as one form of static analyses, is a commonly used approach in software engineering in order to understand the state of a software system, in particular to identify potential areas prone to defects. Family-based techniques extract variability information from code artifacts in Software Product Lines (SPLs) to perform static analysis for all available variants. Many different types of metrics with numerous variants have been defined in literature. When counting all metrics including such variants, easily thousands of metrics can be defined. Computing all of them for large product lines can be an extremely expensive process in terms of performance and resource consumption. Objective: We address these performance and resource challenges while supporting customizable metric suites, which allow running both, single system and variability-aware code metrics. Method: In this paper, we introduce a partial parsing approach used for the efficient measurement of more than 42,000 code metric variations. The approach covers variability information and restricts parsing to the relevant parts of the Abstract Syntax Tree (AST). Conclusions: This partial parsing approach is designed to cover all relevant information to compute a broad variety of variability-aware code metrics on code artifacts containing annotation-based variability, e.g., realized with C-preprocessor statements. It allows for the flexible combination of single system and variability-aware metrics, which is not supported by existing tools. This is achieved by a novel representation of partially parsed product line code artifacts, which is tailored to the computation of the metrics. Our approach consumes considerably less resources, especially when computing many metric variants in parallel.

研究动机与目标

解决分析包含数千种代码与变体感知度量的大规模软件产品线（SPLs）时产生的高计算成本问题。
克服现有工具在结合传统软件度量与SPL特异性变体感知度量方面的支持不足问题。
通过在度量计算过程中最小化资源开销，实现SPLs的可扩展、高性能静态分析。
支持灵活、可定制的度量套件，允许特征级别与代码级别度量的任意组合。
为系统性评估SPL度量套件（如MetricHaven）在真实系统中的表现提供基础。

提出的方法

引入简化抽象语法树（RASTs），抽象掉无关的语法细节，同时保留度量计算所需的关键信息。
直接解析未经预处理的源代码，以保留如C预处理器指令等变体信息，无需完整预处理。
构建粗粒度的AST表示，省略非关键的语言结构，从而降低内存与处理开销。
在RAST上单次遍历中计算度量值，利用CPU缓存并最小化数据结构重建。
通过整合特征模型数据与代码级AST，支持混合度量计算，实现对特征与代码度量的统一分析。
设计RAST表示时，专门针对常见SPL度量的需求进行优化，不支持需要完整AST保真度的度量（如Halstead度量）。

实验结果

研究问题

RQ1支持灵活测量单系统与变体感知代码度量的关键需求是什么？
RQ2如何有效结合现有用于代码与变体模型的变体感知度量？
RQ3为支持大规模SPLs的可扩展分析并兼顾高度多样的度量，需要何种抽象？

主要发现

该方法对整个Linux内核的总分析时间约为6小时20分钟，处理了超过42,000种度量变体。
平均每个度量仅需0.76秒，对29,976项度量而言，每函数度量计算时间约为0.06秒。
与完整AST处理相比，基于RAST的解析显著降低了资源消耗，尤其在并行分析大量度量变体时更为明显。
该框架成功支持变体感知代码度量与特征级别度量的灵活组合，这是现有工具所不具备的能力。
该方法实现了高性能与可扩展性，得益于在RAST上单次遍历计算度量，缓存利用效率高。
该方法高度可配置，当减少待分析度量数量时，可实现显著的性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。