QUICK REVIEW

[论文解读] Non-Local Video Denoising by CNN

Axel Davy, Thibaud Ehret|arXiv (Cornell University)|Nov 30, 2018

Image and Signal Denoising Methods参考文献 54被引用 29

一句话总结

该论文提出VNLNet，一种新颖的CNN架构，通过首先利用不可学习层识别相似的3D时空块，再将这些块的中心值作为特征向量输入CNN以预测干净图像，从而将非局部自相似性搜索整合到视频去噪中。该方法通过有效结合基于块的非局部方法与深度学习，实现了最先进的视频去噪性能，标志着首个在如此高水平上成功应用CNN的视频去噪方法。

ABSTRACT

Non-local patch based methods were until recently state-of-the-art for image denoising but are now outperformed by CNNs. Yet they are still the state-of-the-art for video denoising, as video redundancy is a key factor to attain high denoising performance. The problem is that CNN architectures are hardly compatible with the search for self-similarities. In this work we propose a new and efficient way to feed video self-similarities to a CNN. The non-locality is incorporated into the network via a first non-trainable layer which finds for each patch in the input image its most similar patches in a search region. The central values of these patches are then gathered in a feature vector which is assigned to each image pixel. This information is presented to a CNN which is trained to predict the clean image. We apply the proposed architecture to image and video denoising. For the latter patches are searched for in a 3D spatio-temporal volume. The proposed architecture achieves state-of-the-art results. To the best of our knowledge, this is the first successful application of a CNN to video denoising.

研究动机与目标

为解决非局部块方法在视频去噪中优于CNN的问题，原因在于CNN与自相似性搜索不兼容。
开发一种CNN架构，通过不可学习的非局部层高效整合视频自相似性，实现高性能去噪。
通过融合非局部方法与深度学习的优势，实现视频去噪的最先进结果。
通过为GPU加速优化非局部搜索，实现实时、高效的视频去噪。

提出的方法

一个不可学习的初始层执行3D时空块搜索，从搜索区域内为每个像素找到最相似的块。
对于每个块，收集最相似的N个块的中心像素值，形成每个像素的特征向量。
该特征向量（代表非局部上下文）被输入标准CNN，进行端到端训练以预测干净视频帧。
通过使用共享内存和寄存器存储的有序表，对GPU进行优化的块距离计算，以高效维护N个最佳匹配。
该架构在整个过程中保持完整的空间分辨率，避免使用池化或步长大卷积，且与现有CNN设计兼容。
非局部层为固定参数，不参与训练，而CNN则在干净-噪声视频对上进行端到端训练。

实验结果

研究问题

RQ1能否在基于CNN的去噪框架中有效整合视频中的非局部自相似性？
RQ2一个不可学习的非局部层，若能汇聚相似块的特征，是否能提升视频去噪性能，优于标准CNN？
RQ3该混合方法能否在视频去噪中实现最先进性能，超越传统非局部方法和端到端CNN？
RQ4该非局部块搜索实现方案在现代GPU上的效率如何？
RQ5在复杂运动区域等不可靠匹配情况下，对网络性能有何影响，是否可被缓解？

主要发现

所提出的VNLNet在视频去噪中实现了最先进性能，优于传统非局部方法和标准CNN。
该方法是首个成功将CNN应用于视频去噪并有效利用非局部自相似性的方法。
非局部搜索实现比使用相同算法维护N个最佳匹配的朴素GPU实现快25倍。
在匹配不可靠的区域（如复杂运动区域），网络性能退化至单图像去噪水平，表明需要自适应块大小或匹配质量反馈机制。
最佳性能在41×41块大小下实现，凸显了可靠块匹配的重要性。
该架构保持完整空间分辨率，避免使用池化，从而在去噪输出中保留了精细细节。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。