QUICK REVIEW

[论文解读] Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

Junyi Li, Zhilu Zhang|arXiv (Cornell University)|Apr 11, 2024

Image and Signal Denoising Methods被引用 6

一句话总结

介绍 TBSN，一种基于 Transformer 的盲点网络用于自监督图像去噪，具备掩蔽窗口注意力和分组通道注意力以满足盲点约束并扩展感受野；提供一个知识蒸馏的 U-Net 以实现高效推理。

ABSTRACT

Blind-spot networks (BSN) have been prevalent neural architectures in self-supervised image denoising (SSID). However, most existing BSNs are conducted with convolution layers. Although transformers have shown the potential to overcome the limitations of convolutions in many image restoration tasks, the attention mechanisms may violate the blind-spot requirement, thereby restricting their applicability in BSN. To this end, we propose to analyze and redesign the channel and spatial attentions to meet the blind-spot requirement. Specifically, channel self-attention may leak the blind-spot information in multi-scale architectures, since the downsampling shuffles the spatial feature into channel dimensions. To alleviate this problem, we divide the channel into several groups and perform channel attention separately. For spatial selfattention, we apply an elaborate mask to the attention matrix to restrict and mimic the receptive field of dilated convolution. Based on the redesigned channel and window attentions, we build a Transformer-based Blind-Spot Network (TBSN), which shows strong local fitting and global perspective abilities. Furthermore, we introduce a knowledge distillation strategy that distills TBSN into smaller denoisers to improve computational efficiency while maintaining performance. Extensive experiments on real-world image denoising datasets show that TBSN largely extends the receptive field and exhibits favorable performance against state-of-theart SSID methods.

研究动机与目标

通过利用变换器的能力同时保持盲点约束来推动并改进自监督图像去噪（SSID）。
设计一个基于变换器的盲点网络（TBSN），以扩展对真实世界噪声模式的感受野。
通过在通道注意力中对通道进行分组并在组内应用注意力，解决通道注意力可能的信息泄露问题。
通过知识蒸馏策略来实现一个高效推理的 U-Net 学生模型（TBSN2UNet）以提高实用性。

提出的方法

开发带学习注意力掩码的掩蔽窗口自注意力（M-WSA），将注意力限制在偶坐标位置以模拟扩张卷积。
引入分组通道级自注意力（G-CSA），当通道数超过空间分辨率时通过在较小组内处理通道来防止盲点信息泄露。
组装一个扩张的变换器注意力块（DTAB），将 M-WSA、G-CSA 和 FFN 结合在一个扩张的变换器架构中，构建在一个编码-解码 U-Net 之中用于 SSID。
训练与推理阶段应用像素重排下采样（PD），以在保持盲点完整性的同时打破噪声相关性。
提出一种知识蒸馏方案，在其中预训练的 TBSN 作为教师模型来训练紧凑的 U-Net 学生模型（TBSN2UNet），实现高效推理。
在真实世界去噪基准 SIDD 和 DND 上进行评估，以与最先进的 SSID 方法进行比较。

实验结果

研究问题

RQ1变换器驱动的算子能否被重新设计以满足 SSID 的盲点要求？
RQ2空间和通道自注意力机制如何影响盲点完整性与去噪性能？
RQ3将 TBSN 蒸馏为更小的 U-Net 是否能在降低计算成本的同时保留性能？

主要发现

TBSN 在 SIDD 和 DND 基准上成为自监督方法中的最先进的 SSID 性能方法。
掩蔽窗口自注意力（M-WSA）在不违反盲点约束的情况下扩展局部感受野，提升去噪精度。
分组通道级自注意力（G-CSA）在多尺度结构中防止盲点信息泄露，保持性能。
DTAB 实现局部与全局特征的互补整合，显著扩展感受野并带来 PSNR 增益。
知识蒸馏到 TBSN2UNet 提供了显著的推理效率提升，同时保持与教师相近的性能。
TBSN 在真实世界数据集上超越若干早期的 SSID 方法，并接近有监督基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。