QUICK REVIEW

[論文レビュー] Few-Shot Segmentation via Cycle-Consistent Transformer

Gengwei Zhang, Guoliang Kang|arXiv (Cornell University)|Jun 4, 2021

Domain Adaptation and Few-Shot Learning参考文献 44被引用数 31

ひとこと要約

CyCTR はピクセル単位のサポート特徴をクエリ特徴へ統合する循環的一致性トランスフォーマを導入し、Pascal-5i および COCO-20i で最先端の結果を達成する。

ABSTRACT

Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as conditional information. These methods cannot utilize all pixel-wise support information for the query predictions, which is however critical for the segmentation task. In this paper, we focus on utilizing pixel-wise relationships between support and query images to facilitate the few-shot segmentation task. We design a novel Cycle-Consistent TRansformer (CyCTR) module to aggregate pixel-wise support features into query ones. CyCTR performs cross-attention between features from different images, i.e. support and query images. We observe that there may exist unexpected irrelevant pixel-level support features. Directly performing cross-attention may aggregate these features from support to query and bias the query features. Thus, we propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features and encourage query features to attend to the most informative pixels from support images. Experiments on all few-shot segmentation benchmarks demonstrate that our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods. Specifically, on Pascal-$5^i$ and COCO-$20^i$ datasets, we achieve 67.5% and 45.6% mIoU for 5-shot segmentation, outperforming previous state-of-the-art methods by 5.6% and 7.1% respectively.

研究の動機と目的

各クエリピクセルに対してピクセルレベルのサポート情報を活用することをfew-shot segmentationの動機づけとして提案する。
クロスイメージアテンション中に有害なサポート特徴を除去するための循環的一致性アテンション機構を開発する。
CyCTR を提案し、自己整列ブロックとクロス整列ブロックを用いてピクセル単位のサポート特徴をクエリ特徴へ集約する。
標準的なfew-shot segmentationベンチマーク（Pascal-5i、COCO-20i）で最先端の性能を示す。

提案手法

エンコーダごとに2つのトランスフォーマーブロックを備えた CyCTR を導入する：自己整列ブロック（クエリコンテクスト）とクロス整列ブロック（クエリ–サポートアテンション）。
クロスアテンション中に循環一致性に反するサポートピクセルを抑制する循環一致性アテンションを実装する（式5）。
アフィニティ A = QK^T / sqrt(d) を計算し、集約時に循環一致性バイアス B を適用して非循環一致サポートトークンをゼロにする（式3〜5）。
K-shot 設定を扱うためのサンプリング戦略を用いて、前景/背景トークン（N_fg および N_s）を選択し、スケーラブルなクロスアテンションを実現する。
前提地図とグローバルサポート特徴を含む CyCTR を上に置いた共有バックボーン（ImageNet 事前学習済 ResNet）を採用し、その後に分類ヘッドを配置する。
Dice 損失とグラウンデッドサポートベースのセグメンテーションマップからの補助損失で訓練する；AdamW 最適化を用いる。

実験結果

リサーチクエスチョン

RQ1サポート画像とクエリ画像間のピクセルレベルのクロスアテンションは、プロトタイプベースの手法を超えてfew-shot segmentation を改善するか。
RQ2循環一致性アテンションを導入することで、有害なサポートピクセルを効果的に除去し、有益な背景ピクセルを保持できるか。
RQ3CyCTR は標準ベンチマーク（Pascal-5i、COCO-20i）で1-shotおよび5-shot設定でどのような性能を示すか。
RQ4エンコーダの深さ、隠れ次元、サンプリング戦略が性能と効率に与える影響はどのようになるか。

主な発見

CyCTR は1-shotおよび5-shot設定で Pascal-5i と COCO-20i の最先端の結果を達成する。
Pascal-5i の ResNet-50 で、1-shot mIoU = 64.0、5-shot mIoU = 69.3（平均）。
Pascal-5i の ResNet-101 で、1-shot mIoU = 63.7、5-shot mIoU = 67.4（平均）。
COCO-20i の ResNet-50 で、1-shot mIoU = 40.3、5-shot mIoU = 41.1（平均）。
循環一致性アテンションは、バニラのクロスアテンションやベースラインよりも顕著な改善をもたらす（アブレーション結果は、主要な変種で最大約0.6–0.9%のmIoUの利得を示す）。
従来法と比較して、CyCTR は評価分割（例：Pascal-5i の FB-IoU、ResNet-101 で1-shotが73.0%、5-shotが75.4%）でより大きな利得を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。