Skip to main content
QUICK REVIEW

[论文解读] One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Amal Akli, Maxime Cordy|arXiv (Cornell University)|Feb 3, 2026
Software Engineering Research被引用 0
一句话总结

本文表明在多任务代码分析任务中共用的单一参数高效微调(PEFT)模块,可以匹配或超过全面多任务微调,同时大幅减少可训练参数和计算量。

ABSTRACT

Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within a single model, but fully fine-tuning LLMs across tasks is computationally prohibitive. Parameter-efficient fine-tuning mitigates this cost by updating only a small fraction of weights. Although PEFT has proven effective in single-task settings, its potential for multi-task learning has not yet been systematically explored. We present the first comprehensive evaluation of multi-task PEFT for code analysis, comparing several methods across diverse tasks and model architectures. Our experiments show that a single PEFT module shared across tasks can match, and in some cases surpass, full multi-task fine-tuning, confirming that the benefits of PEFT extend beyond isolated tasks. When comparing single-task and multi-task setups, we find that multi-task PEFT achieves a favorable performance-efficiency trade-off: it delivers accuracy close to single-task fine-tuning while reducing storage requirements, cutting the number of trainable parameters by a factor of the task count, and lowering computation costs by as much as 85%. At the same time, multi-task gains remain sensitive to task grouping. Through task-pairing experiments, we identify key factors shaping outcomes: task stability, model architecture, task complementarity, asymmetry, and dataset quality determine the success of co-fine-tuning. Finally, we benchmark efficient multi-task PEFT against direct prompting of open-source general-purpose LLMs, including DeepSeek, Qwen, Mistral, CodeLlama, and StarCoder. Despite their strong performance in code generation, these models underperform on analysis tasks, where even a 1B-parameter model with multi-task PEFT achieves significantly better results.

研究动机与目标

  • 评估PEFT方法是否能够有效实现代码分析任务的多任务学习。
  • 在不同模型结构和任务上,将多任务PEFT与完整多任务微调进行对比。
  • 识别影响协同微调成功的因素,包括任务分组和模型类型。
  • 在代码分析任务上,将多任务PEFT与开源通用大型语言模型的零样本提示进行基准对比。

提出的方法

  • 在多任务设置中对四个代码大型语言模型骨干(UniXcoder-base、CodeT5+ Large、DeepSeek-Coder 1.3B、Qwen2.5-Coder-1.5B)进行带PEFT模块的微调(串联适配器、并行适配器、LoRA、前缀调优)。
  • 冻结编码器骨干,仅训练PEFT模块和任务特定头部。
  • 采用一个共享编码器和按任务头部的硬参数共享;输入在轮询多任务采样器中分批处理。
  • 引入自适应、可学习的损失权重,通过软最大归一化权重来平衡任务损失。
  • 与全微调(SFT)以及与大规模指令调优LLM的零样本提示进行比较。
Figure 1 : Overview of four PEFT integration patterns in a Transformer block: serial adapters, parallel adapters, prefix-tuning, and LoRA. Colored components denote the added trainable modules, and dashed insets illustrate their internal layouts.
Figure 1 : Overview of four PEFT integration patterns in a Transformer block: serial adapters, parallel adapters, prefix-tuning, and LoRA. Colored components denote the added trainable modules, and dashed insets illustrate their internal layouts.

实验结果

研究问题

  • RQ1RQ1:PEFT 技术在代码分析的多任务学习中是否有效?
  • RQ2RQ2:多任务PEFT与单任务PEFT之间的性能-效率权衡如何?
  • RQ3RQ3:哪些因素会影响多任务PEFT的性能(任务稳定性、架构、数据集相似性等)?
  • RQ4RQ4:多任务PEFT与通用型LLM的零样本提示相比如何?

主要发现

  • 在多任务学习中的PEFT与完整多任务微调具有竞争力,对于克隆和漏洞任务,差值通常在±1个百分点内。
  • 单任务PEFT与多任务PEFT相比显示出显著的效率提升,可将可训练参数减少约相当于任务数量的数量级。
  • 串联适配器在各任务中表现最可靠;LoRA对检索类目标如代码检索有益;前缀调优在此场景下常常表现较差。
  • 迁移动力学显著影响结果:任务稳定性、模型架构、任务互补性、不对称性和数据集质量会影响协同微调的成功。
  • 在紧凑的代码专用骨干上进行的多任务PEFT,在代码分析任务上始终优于对更大通用LLM的零样本提示,具有接近SFT的准确性但存储/计算需求较低。
  • 在四个骨干和四个任务上,PEFT在效率方面优于,不会在准确性上带来实质性损失,并且在某些指标上可超过全面微调。
Figure 2 : Data processing pipeline for our multi-task fine-tuning.
Figure 2 : Data processing pipeline for our multi-task fine-tuning.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。