Skip to main content
QUICK REVIEW

[论文解读] Domain Adaptation for Code Model-based Unit Test Case Generation

Jiho Shin, Sepehr Hashtroudi|arXiv (Cornell University)|Aug 15, 2023
Software Engineering Research被引用 9
一句话总结

该论文使用 CodeT5 的项目级领域自适应来为 Def defects4j 项目生成可编译的单元测试,显示在行覆盖率、BLEU/CodeBLEU 以及与 EvoSuite 的兼容性方面的提升。

ABSTRACT

Recently, deep learning-based test case generation approaches have been proposed to automate the generation of unit test cases. In this study, we leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level. Specifically, we use CodeT5, a relatively small language model trained on source code data, and fine-tune it on the test generation task. Then, we apply domain adaptation to each target project data to learn project-specific knowledge (project-level DA). We use the Methods2test dataset to fine-tune CodeT5 for the test generation task and the Defects4j dataset for project-level domain adaptation and evaluation. We compare our approach with (a) CodeT5 fine-tuned on the test generation without DA, (b) the A3Test tool, and (c) GPT-4 on five projects from the Defects4j dataset. The results show that tests generated using DA can increase the line coverage by 18.62%, 19.88%, and 18.02% and mutation score by 16.45%, 16.01%, and 12.99% compared to the above (a), (b), and (c) baselines, respectively. The overall results show consistent improvements in metrics such as parse rate, compile rate, BLEU, and CodeBLEU. In addition, we show that our approach can be seen as a complementary solution alongside existing search-based test generation tools such as EvoSuite, to increase the overall coverage and mutation scores with an average of 34.42% and 6.8%, for line coverage and mutation score, respectively.

研究动机与目标

  • Address robustness and domain shift in neural unit test generation.
  • Leverage project-specific data to adapt a pretrained code model for test generation.
  • Evaluate against baselines (GPT-4, A3Test) and a search-based tool (EvoSuite).
  • Demonstrate improvements using real execution metrics like line coverage and mutation score.

提出的方法

  • Fine-tune CodeT5 on a test-generation downstream task using the Methods2test dataset.
  • Generate a project-specific dataset by mapping lines to covering tests and extract three contexts for each line.
  • Apply project-level domain adaptation by further training on the project-specific dataset.
  • Post-process generated tests to ensure compilability using a tree-sitter based parser and incremental compilation.
  • Evaluate generated tests by running them to measure line coverage and mutation score, plus BLEU/CodeBLEU metrics.

实验结果

研究问题

  • RQ1RQ1: Can Transformer-based code models generate high-quality compilable test cases with appreciable line coverage?
  • RQ2RQ2: How effective is project-level domain adaptation in improving code coverage of automatically generated tests?
  • RQ3RQ3: Can the proposed approach augment or complement a search-based test generation tool like EvoSuite?

主要发现

  • Domain adaptation increases line coverage on average by 18.62% over CodeT5 without DA, 19.88% over A3Test, and 18.02% over GPT-4 baselines.
  • The approach yields improvements in BLEU and CodeBLEU scores and complements EvoSuite, increasing overall coverage and mutation scores.
  • Using domain-adapted test generation alongside EvoSuite yields an average 34.42% gain in line coverage and 6.8% gain in mutation score.
  • Evaluation uses real test execution metrics (line coverage and mutation testing) rather than only NLP-based metrics.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。