QUICK REVIEW

[論文レビュー] Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

Junyi Li, Zhilu Zhang|arXiv (Cornell University)|Apr 11, 2024

Image and Signal Denoising Methods被引用数 6

ひとこと要約

Transformer ベースのブラインドスポットネットワーク TBSN を紹介する。自己教師付き画像ノイズ除去を目的とし、ブラインドスポット制約を満たし受容野を拡張するマスク付きウィンドウ注意とグループ化されたチャネルごとの注意を組み込む；効率的な推論のための知識蒸留済み U-Net を提供する。

ABSTRACT

Blind-spot networks (BSN) have been prevalent neural architectures in self-supervised image denoising (SSID). However, most existing BSNs are conducted with convolution layers. Although transformers have shown the potential to overcome the limitations of convolutions in many image restoration tasks, the attention mechanisms may violate the blind-spot requirement, thereby restricting their applicability in BSN. To this end, we propose to analyze and redesign the channel and spatial attentions to meet the blind-spot requirement. Specifically, channel self-attention may leak the blind-spot information in multi-scale architectures, since the downsampling shuffles the spatial feature into channel dimensions. To alleviate this problem, we divide the channel into several groups and perform channel attention separately. For spatial selfattention, we apply an elaborate mask to the attention matrix to restrict and mimic the receptive field of dilated convolution. Based on the redesigned channel and window attentions, we build a Transformer-based Blind-Spot Network (TBSN), which shows strong local fitting and global perspective abilities. Furthermore, we introduce a knowledge distillation strategy that distills TBSN into smaller denoisers to improve computational efficiency while maintaining performance. Extensive experiments on real-world image denoising datasets show that TBSN largely extends the receptive field and exhibits favorable performance against state-of-theart SSID methods.

研究の動機と目的

ブラインドスポット制約を維持しつつ、トランスフォーマーの能力を活用して自己教師付き画像ノイズ除去（SSID）を動機づけ、改善する。
実世界のノイズパターンに対して受容野を拡張する、Transformer ベースのブラインドスポットネットワーク（TBSN）を設計する。
チャネル注意における情報漏洩の可能性に対処するため、チャネルをグループ化してグループ内で注意を適用する。
効率的な推論の実用性を高めるため、知識蒸留戦略を用いて効率的な U-Net 学生モデル（TBSN2UNet）を作成する。

提案手法

学習済みのアテンションマスクで現在位置の偶数座標点のみに注意を制限する、マスク付きウィンドウベースの自己注意機構（M-WSA）を開発し、拡張畳み込みを模倣する。
チャネル数が空間解像度を超える場合にブラインドスポット情報の漏洩を防ぐため、チャネルを小さなグループで処理するグループ化チャネル沿い自己注意（G-CSA）を導入する。
M-WSA、G-CSA、FFN を組み合わせて拡張トランスフォーマーアテンションブロック（DTAB）を作成し、SSID のためのエンコーダ-デコーダ U-Net 内に膨張型トランスフォーマーアーキテクチャを構築する。
トレーニング時と推論時にアシンメトリックな要因を用いたピクセルシャッフルダウンサンプリング（PD）を適用し、ノイズ相関を破りつつ blind-spot の完全性を維持する。
事前学習済みの TBSN を教師として用い、効率的な推論のためのコンパクトな U-Net 学生モデル（TBSN2UNet）を訓練する知識蒸留方式を提案する。
実世界のデノイジングベンチマーク SIDD および DND で評価し、最先端の SSID 手法と比較する。

実験結果

リサーチクエスチョン

RQ1Transformer ベースの演算子を再設計して、SSID におけるブラインドスポット要件を満たすことは可能か？
RQ2空間的およびチャネル自己注意機構は、ブラインドスポットの整合性とデノイズ性能にどのような影響を与えるか？
RQ3TBSN をより小さな U-Net に知識蒸留しても、計算コストを削減したまま性能を維持できるか？

主な発見

Method	SIDD Benchmark PSNR / SSIM	DND Benchmark PSNR / SSIM
CVF-SID [43]	34.71 / 0.917	36.50 / 0.924
AP-BSN [30]	36.91 / 0.931	38.09 / 0.937
SASL [32]	37.41 / 0.934	38.18 / 0.938
LG-BPN [58]	37.28 / 0.936	38.43 / 0.942
PUCA [24]	37.54 / 0.936	38.83 / 0.942
TBSN (Ours)	37.78 / 0.940	39.08 / 0.945
TBSN2UNet (Ours)	37.79 / 0.940	39.01 / 0.945

TBSN は Self-supervised 方法の中で SIDD および DND ベンチマークにおいて最先端の SSID 性能を達成する。
マスク付きウィンドウベースの自己注意（M-WSA）は、ブラインドスポット制約を尊重しつつ局所受容野を拡張し、デノイズ精度を向上させる。
グループ化されたチャネル単位の自己注意（G-CSA）は、多段階アーキテクチャにおけるブラインドスポット情報漏洩を防ぎ、性能を維持する。
DTAB は局所およびグローバル特徴の統合を補完的に可能にし、受容野の大幅な拡大と PSNR の向上をもたらす。
TBSN2UNet への知識蒸留は、教師モデルと同等の性能を保ちつつ推論効率を大幅に向上させる。
TBSN は従来のいくつかの SSID 手法を上回り、実世界データセットで教師ありベースラインに近い性能を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。