QUICK REVIEW

[論文レビュー] Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

Chonghua Lv, Dong Zhao|arXiv (Cornell University)|Mar 3, 2026

Domain Adaptation and Few-Shot Learning被引用数 0

ひとこと要約

GKDは表現学習とタスク学習を切り分け、クエリベースのソフト蒸留を用いてビジョン foundation モデルの一般化能力を軽量なセマンティックセグメンテーションモデルへ転移し、ドメイン横断の利得を達成する。

ABSTRACT

Knowledge distillation (KD) has been widely applied in semantic segmentation to compress large models, but conventional approaches primarily preserve in-domain accuracy while neglecting out-of-domain generalization, which is essential under distribution shifts. This limitation becomes more severe with the emergence of vision foundation models (VFMs): although VFMs exhibit strong robustness on unseen data, distilling them with conventional KD often compromises this ability. We propose Generalizable Knowledge Distillation (GKD), a multi-stage framework that explicitly enhances generalization. GKD decouples representation learning from task learning. In the first stage, the student acquires domain-agnostic representations through selective feature distillation, and in the second stage, these representations are frozen for task adaptation, thereby mitigating overfitting to visible domains. To further support transfer, we introduce a query-based soft distillation mechanism, where student features act as queries to teacher representations to selectively retrieve transferable spatial knowledge from VFMs. Extensive experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods, achieving average gains of +1.9% in foundation-to-foundation (F2F) and +10.6% in foundation-to-local (F2L) distillation. The code will be available at https://github.com/Younger-hua/GKD.

研究の動機と目的

distribution shifts の下で従来の KD の一般化ボトルネックを動機づけ・診断する。
表現学習とタスク学習を分離する2段階のドメイン一般化蒸留フレームワーク（GKD）を提案する。
VFMs から転移可能な空間的知識を選択的に転送するクエリベースのソフト蒸留機構を導入する。
F2F および F2L 設定で複数のドメイン一般化ベンチマーク上で GKD を検証し、最先端の一般化利得を示す。

提案手法

最初にビジョン foundation models (VFMs) から proxy データセットを用いてタスク非依存・ドメイン非依存の表現を学習する2段階蒸留、次にエンコーダを固定してソースデータ上でタスク特異デコーダを訓練する。
ドメイン一般化蒸留はproxy データセット（ImageNet）を用いたタスク非依存転送と、ドメイン関連特徴を取り入れるソースドメイン蒸留を用いる。
クエリベースのソフト蒸留（QSD）は、学習された射影を用いて教師特徴上のアテンションを介して学生特徴を再構成し、空間的関係を整合させる。
QSD は特徴蒸留損失、マスク蒸留損失、CLS トークン蒸留損失を結合して L_QSD = α L_feat + β L_mask + γ L_cls。
タスク学習時にはデコーダのみがラベル付きデータで訓練され、エンコーダは一般表現を保持するために凍結する。
アブレーションと可視化は、QSD を用いた多段蒸留が単段 KD よりも跨ドメインの整合性を改善することを示す。

実験結果

リサーチクエスチョン

RQ1従来の単段 KD は VFMs から蒸留されるセグメンテーションモデルの跨ドメイン一般化を改善するか。
RQ2表現学習とタスク学習を分離することはセマンティックセグメンテーションの KD における域外一般化を改善するか。
RQ3クエリベースのソフト蒸留機構はVFMs から軽量な学生へ空間的・関係的知識を効果的に転送するか。
RQ4F2F および F2L の設定における diverse ベンチマークで GKD の跨ドメイン性能向上はどの程度か。
RQ5限られたラベルデータと比較して GKD は従来のKD に比べどのように性能を発揮するか。

主な発見

GKD は F2F および F2L の両方の設定で、5つのドメイン一般化ベンチマークで既存の KD 手法を一貫して上回る。
平均利得は foundation-to-foundation (F2F) で +1.9%、foundation-to-local (F2L) で +10.6% の蒸留。
ラベル希少な F2L 設定で、GKD はラベル効率と跨ドメイン一般化を大幅に改善。
DINOv2-B から ViT-S（他 VFMs へも）へ転送する場合に顕著な改善を達成し、EVA02 など他のVFMs への転送性も示す。
QSD を用いた二段階蒸留は一段 KD よりも収束が滑らかで跨ドメイン性能が良好。
可視化は特徴距離の近さと対角的なアテンションパターンを示し、空間構造の保持と選択的知識転送を反映。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。