QUICK REVIEW

[論文レビュー] Offsite-Tuning: Transfer Learning without Full Model

Guangxuan Xiao, Ji Lin|arXiv (Cornell University)|Feb 9, 2023

Domain Adaptation and Few-Shot Learning被引用数 21

ひとこと要約

Offsite-Tuning は、全モデルウェイトへアクセスせずに、billion-parameter foundation models を下流データへ適応させることを可能にする。学習可能なアダプタとデータ所有者と共有される損失圧縮済みエミュレータを用い、プライバシーを保ちながら、fullファインチューニングと比較して精度をほぼ維持しつつ、より効率的なファインチューニングを実現する。

ABSTRACT

Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning.

研究の動機と目的

Motivate privacy and efficiency challenges in fine-tuning proprietary foundation models for downstream tasks.
Propose a framework that enables fine-tuning without exposing full model weights or data.
Demonstrate applicability to both language and vision foundation models across standard benchmarks.

提案手法

Split the foundation model into a small trainable adapter (A) and a frozen remainder (E); apply lossy compression to E to create an emulator (E*).
Supply [A, E*] to the data owner who fine-tunes A using approximate gradients from E*.
Return the updated adapter A' to the model owner, who injects it into the full model to yield M' = [A', E].
Explore layer-drop based emulator compression by dropping middle layers of E while retaining first and last layers; optionally distill E* from E when resources permit.
Design adapters as a sandwich M = A1 ∘ E ∘ A2 to capture both shallow and deep layer updates, improving transfer performance over updating only the top or bottom layers.
Combine Offsite-Tuning with parameter-efficient fine-tuning techniques (Adapter, LoRA, BitFit) by applying these methods on the adapter layers to further reduce trainable parameters.

Figure 1: Comparing existing fine-tuning approaches (top and middle) and Offsite-Tuning (bottom). (a) Traditionally, users send labeled data to model owners for fine-tuning, raising privacy concerns and incurring high computational costs. (b) Model owner sending the full model to the data owner is n

実験結果

リサーチクエスチョン

RQ1Can a small adapter plus a compressed emulator enable effective fine-tuning of billion-parameter foundation models without sharing full model weights or data?
RQ2How should the emulator be compressed to balance gradient usefulness for adapters and protection of model ownership?
RQ3Does the plug-in (adapter-trained on the data owner’s data, then plugged into the full model) approach approach full fine-tuning performance across language and vision tasks?
RQ4What are the efficiency gains (speed, memory) when using Offsite-Tuning, and how do they scale with model size and compression strategy?
RQ5How does Offsite-Tuning interact with existing parameter-efficient fine-tuning methods?

主な発見

Offsite-Tuning achieves comparable plug-in performance to full fine-tuning on several language and vision tasks while preserving privacy (no access to full model weights).
Layer-drop based emulator compression provides the best balance between performance and privacy, with a visible gap between emulator and plug-in performance that preserves model ownership.
Distillation of the emulator further improves plug-in performance versus emulator performance, improving outcomes on specific models (e.g., OPT-1.3B and GPT2-XL).
Combining Offsite-Tuning with parameter-efficient fine-tuning methods (Adapter, LoRA) reduces trainable parameters while maintaining or improving plug-in performance; BitFit tends to underperform relative to full fine-tuning in some cases.
Efficiency gains are substantial: up to 6.5x throughput speedup and 5.6x memory reduction when combined with LoRA on single-GPU hardware.

Figure 2: Overview of Offsite-Tuning. Fine-tuning (left) requires access to the full model weights and needs both model and data to be in one location. In Offsite-tuning (right), the model owner sends an adapter and an emulator to the data owner, who fine-tunes the adapter on the downstream data wit

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。