QUICK REVIEW

[論文レビュー] CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

Tongkun Xu, Weihua Chen|arXiv (Cornell University)|Sep 13, 2021

Domain Adaptation and Few-Shot Learning参考文献 84被引用数 111

ひとこと要約

CDTransはクロスアテンションを備えた重み共有のトリプルブランチTransformerを導入し、教師なしドメイン適応(UDA)を実現。ソースとターゲットドメインを整合させるために二方向のセンター認識型疑似ラベリングを用い、VisDA-2017とDomainNetで最新性能を達成する。

ABSTRACT

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to a different unlabeled target domain. Most existing UDA methods focus on learning domain-invariant feature representation, either from the domain level or category level, using convolution neural networks (CNNs)-based frameworks. One fundamental problem for the category level based UDA is the production of pseudo labels for samples in target domain, which are usually too noisy for accurate domain alignment, inevitably compromising the UDA performance. With the success of Transformer in various tasks, we find that the cross-attention in Transformer is robust to the noisy input pairs for better feature alignment, thus in this paper Transformer is adopted for the challenging UDA task. Specifically, to generate accurate input pairs, we design a two-way center-aware labeling algorithm to produce pseudo labels for target samples. Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively. Such design explicitly enforces the framework to learn discriminative domain-specific and domain-invariant representations simultaneously. The proposed method is dubbed CDTrans (cross-domain transformer), and it provides one of the first attempts to solve UDA tasks with a pure transformer solution. Experiments show that our proposed method achieves the best performance on public UDA datasets, e.g. VisDA-2017 and DomainNet. Code and models are available at https://github.com/CDTrans/CDTrans.

研究の動機と目的

カテゴリーレベルの整合性において、教師なしドメイン適応（UDA）を動機づけ、疑似ラベルのノイズに対処する。
クロスアテンションを用いた堅牢なソース-ターゲット整合のための純粋なTransformerベースのフレームワーク（CDTrans）を提案する。
高品質な疑似ラベルを生成するための二方向センター認識型疑似ラベリング手法を導入する。
主要なUDAベンチマーク（VisDA-2017、DomainNet）で最先端の性能を示す。

提案手法

ソース、ターゲット、ソース-ターゲットブランチの三重のブランチで重みを共有するTransformerを提案する。
ドメイン整合のためにソース/ターゲットブランチには自己注意を、ソース-ターゲットブランチにはクロスアテンションを適用する。
訓練中にソース-ターゲットブランチからの蒸留損失を用いてターゲットブランチを監督する。
クロスドメインの類似性とセンターに基づく精練に基づいて疑似ラベルを生成・フィルタリングする二方向センター認識ラベリングアルゴリズムを設計する。
ソース/ターゲットブランチの分類損失とソース-ターゲットおよびターゲットブランチ間の蒸留損失で訓練し、推論はターゲットブランチのみを用いる。

実験結果

リサーチクエスチョン

RQ1ノイズのある疑似ラベルにもかかわらず、クロスアテンションを備えた純粋なTransformerは2つのドメインを頑健に整列させることでUDAを改善できるか？
RQ2二方向センター認識疑似ラベリング戦略はクロスドメインTransformerの訓練に対してより高品質な入力ペアを生み出すか？
RQ3標準的なUDAベンチマークにおけるCDTransフレームワークのパフォーマンスは、最先端手法と比較してどうか？
RQ4クロスアテンションを使用した場合、異なる損失（cls vs dtl）がUDA性能に与える影響は？

主な発見

Method	飛行機	自転車	バス	車	馬	ナイフ	オートバイ	人	植物	スケートボード	電車	トラック	平均
ResNet-50	55.1	53.3	61.9	59.1	80.6	17.9	79.7	31.2	81.0	26.5	73.5	8.5	52.4
DANN	81.9	77.7	82.8	44.3	81.2	29.5	65.1	28.6	51.9	54.6	82.8	7.8	57.4
MinEnt	80.3	75.5	75.8	48.3	77.9	27.3	69.7	40.2	46.5	46.6	79.3	16.0	57.0
MCD	87.0	60.9	83.7	64.0	88.9	79.6	84.7	76.9	88.6	40.3	83.0	25.8	71.9
SWD	90.8	82.5	81.7	70.5	91.7	69.5	86.3	77.5	87.4	63.6	85.6	29.2	76.4
CDAN+E	85.2	66.9	83.0	50.8	84.2	74.9	88.1	74.5	83.4	76.0	81.9	38.0	73.9
BNM	89.6	61.5	76.9	55.0	89.3	69.1	81.3	65.5	90.0	47.3	89.1	30.1	70.4
MSTN+DSBN	94.7	86.7	76.0	72.0	95.2	75.1	87.9	81.3	91.1	68.9	88.3	45.5	80.2
CGDM	93.7	82.7	73.2	68.4	92.9	94.5	88.7	82.1	93.4	82.5	86.8	49.2	82.3
CGDM*	92.8	85.1	76.3	64.5	91.0	93.2	81.3	79.3	92.4	83.0	85.6	44.8	80.8
SHOT	94.3	88.5	80.1	57.3	93.1	93.1	80.7	80.3	91.5	89.1	86.3	58.2	82.9
SHOT*	95.5	87.5	80.1	54.5	93.6	94.2	80.2	80.9	90.0	89.9	87.1	58.4	82.7
TVT ∘	92.9	85.6	77.5	60.5	93.6	98.2	89.4	76.4	93.6	92.0	91.7	55.7	83.9
Baseline-S	55.6	73.0	79.4	70.6	72.9	76.3	67.5	51.0	81.0	74.5	53.2	86.0	69.8
Ours-S	94.6	93.5	78.4	98.2	78.0	81.0	82.3	76.7	84.4	77.0	59.1	85.5	74.7
Baseline-B	61.8	79.5	84.3	75.4	78.8	81.2	72.8	55.7	84.4	78.3	59.3	86.0	74.8
CGDM-B*	67.1	83.9	85.4	77.2	83.3	83.7	74.6	64.7	85.6	79.3	69.5	87.7	78.5
SHOT-B*	67.1	83.5	85.5	76.6	83.4	83.7	76.3	65.3	85.3	80.4	66.7	83.4	78.1
Ours-B	68.8	85.0	86.9	81.5	87.1	87.3	79.6	63.3	88.2	82.0	66.0	90.6	80.5

クロスアテンションと三つの重み共有ブランチを備えたCDTransは強力なUDA性能を達成し、公開ベンチマークの多くの最先端手法を上回る。
二方向センター認識ラベリング法は疑似ラベルの品質を向上させ、真のペアで訓練された上限性能に近づく。
包括的なアブレーションにより、分類損失と蒸留損失を組み合わせるとVisDA-2017で最良の結果が得られる。
TransformersはUDAにおいて堅牢な一般化を提供し、複数のデータセット（VisDA-2017、Office-Home、Office-31、DomainNet）でCNNベースのベースラインより有意な改善を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。