QUICK REVIEW

[论文解读] One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Amal Akli, Maxime Cordy|arXiv (Cornell University)|Feb 3, 2026

Software Engineering Research被引用 0

一句话总结

本文表明在多任务代码分析任务中共用的单一参数高效微调（PEFT）模块，可以匹配或超过全面多任务微调，同时大幅减少可训练参数和计算量。

ABSTRACT

Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within a single model, but fully fine-tuning LLMs across tasks is computationally prohibitive. Parameter-efficient fine-tuning mitigates this cost by updating only a small fraction of weights. Although PEFT has proven effective in single-task settings, its potential for multi-task learning has not yet been systematically explored. We present the first comprehensive evaluation of multi-task PEFT for code analysis, comparing several methods across diverse tasks and model architectures. Our experiments show that a single PEFT module shared across tasks can match, and in some cases surpass, full multi-task fine-tuning, confirming that the benefits of PEFT extend beyond isolated tasks. When comparing single-task and multi-task setups, we find that multi-task PEFT achieves a favorable performance-efficiency trade-off: it delivers accuracy close to single-task fine-tuning while reducing storage requirements, cutting the number of trainable parameters by a factor of the task count, and lowering computation costs by as much as 85%. At the same time, multi-task gains remain sensitive to task grouping. Through task-pairing experiments, we identify key factors shaping outcomes: task stability, model architecture, task complementarity, asymmetry, and dataset quality determine the success of co-fine-tuning. Finally, we benchmark efficient multi-task PEFT against direct prompting of open-source general-purpose LLMs, including DeepSeek, Qwen, Mistral, CodeLlama, and StarCoder. Despite their strong performance in code generation, these models underperform on analysis tasks, where even a 1B-parameter model with multi-task PEFT achieves significantly better results.

研究动机与目标

评估PEFT方法是否能够有效实现代码分析任务的多任务学习。
在不同模型结构和任务上，将多任务PEFT与完整多任务微调进行对比。
识别影响协同微调成功的因素，包括任务分组和模型类型。
在代码分析任务上，将多任务PEFT与开源通用大型语言模型的零样本提示进行基准对比。

提出的方法

在多任务设置中对四个代码大型语言模型骨干（UniXcoder-base、CodeT5+ Large、DeepSeek-Coder 1.3B、Qwen2.5-Coder-1.5B）进行带PEFT模块的微调（串联适配器、并行适配器、LoRA、前缀调优）。
冻结编码器骨干，仅训练PEFT模块和任务特定头部。
采用一个共享编码器和按任务头部的硬参数共享；输入在轮询多任务采样器中分批处理。
引入自适应、可学习的损失权重，通过软最大归一化权重来平衡任务损失。
与全微调（SFT）以及与大规模指令调优LLM的零样本提示进行比较。

Figure 1 : Overview of four PEFT integration patterns in a Transformer block: serial adapters, parallel adapters, prefix-tuning, and LoRA. Colored components denote the added trainable modules, and dashed insets illustrate their internal layouts.

实验结果

研究问题

RQ1RQ1：PEFT 技术在代码分析的多任务学习中是否有效？
RQ2RQ2：多任务PEFT与单任务PEFT之间的性能-效率权衡如何？
RQ3RQ3：哪些因素会影响多任务PEFT的性能（任务稳定性、架构、数据集相似性等）？
RQ4RQ4：多任务PEFT与通用型LLM的零样本提示相比如何？

主要发现

在多任务学习中的PEFT与完整多任务微调具有竞争力，对于克隆和漏洞任务，差值通常在±1个百分点内。
单任务PEFT与多任务PEFT相比显示出显著的效率提升，可将可训练参数减少约相当于任务数量的数量级。
串联适配器在各任务中表现最可靠；LoRA对检索类目标如代码检索有益；前缀调优在此场景下常常表现较差。
迁移动力学显著影响结果：任务稳定性、模型架构、任务互补性、不对称性和数据集质量会影响协同微调的成功。
在紧凑的代码专用骨干上进行的多任务PEFT，在代码分析任务上始终优于对更大通用LLM的零样本提示，具有接近SFT的准确性但存储/计算需求较低。
在四个骨干和四个任务上，PEFT在效率方面优于，不会在准确性上带来实质性损失，并且在某些指标上可超过全面微调。

Figure 2 : Data processing pipeline for our multi-task fine-tuning.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。