QUICK REVIEW

[论文解读] A Lightweight Optical Flow CNN - Revisiting Data Fidelity and Regularization

Tak-Wai Hui, Xiaoou Tang|arXiv (Cornell University)|Mar 15, 2019

Advanced Vision and Imaging参考文献 44被引用 28

一句话总结

LiteFlowNet2 提出了一种轻量级、快速且准确的光流卷积神经网络，重新审视了变分方法中的数据保真度与正则化，采用特征扭曲、级联光流推理和特征驱动正则化。其在 Sintel 和 KITTI 基准测试中达到最先进精度，同时模型大小仅为 FlowNet2 的 1/25.3，推理速度提升 3.1 倍，在 Sintel Clean 上性能优于 LiteFlowNet 23.3%。

ABSTRACT

Over four decades, the majority addresses the problem of optical flow estimation using variational methods. With the advance of machine learning, some recent works have attempted to address the problem using convolutional neural network (CNN) and have showed promising results. FlowNet2, the state-of-the-art CNN, requires over 160M parameters to achieve accurate flow estimation. Our LiteFlowNet2 outperforms FlowNet2 on Sintel and KITTI benchmarks, while being 25.3 times smaller in the model size and 3.1 times faster in the running speed. LiteFlowNet2 is built on the foundation laid by conventional methods and resembles the corresponding roles as data fidelity and regularization in variational methods. We compute optical flow in a spatial-pyramid formulation as SPyNet but through a novel lightweight cascaded flow inference. It provides high flow estimation accuracy through early correction with seamless incorporation of descriptor matching. Flow regularization is used to ameliorate the issue of outliers and vague flow boundaries through feature-driven local convolutions. Our network also owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2 and SPyNet. Comparing to LiteFlowNet, LiteFlowNet2 improves the optical flow accuracy on Sintel Clean by 23.3%, Sintel Final by 12.8%, KITTI 2012 by 19.6%, and KITTI 2015 by 18.8%, while being 2.2 times faster. Our network protocol and trained models are made publicly available on https://github.com/twhui/LiteFlowNet2.

研究动机与目标

开发一种轻量级光流卷积神经网络，在显著减小模型大小和推理时间的同时，保持高精度，相比现有深度学习方法有显著优势。
通过在深度学习框架中显式建模数据保真度与正则化项，弥合经典变分光流方法与现代卷积神经网络之间的差距。
通过新颖的级联光流推理机制与有效的特征驱动正则化，提升光流估计的精度。
实现实时部署光流网络于资源受限的应用场景，如 SLAM、视频处理与三维重建。

提出的方法

使用空间金字塔特征提取网络（NetC）从输入图像对中生成多尺度特征。
采用级联光流推理模块（NetE），通过描述符匹配与亚像素精化，在多个金字塔层级上逐步优化光流预测。
使用特征扭曲而非图像扭曲来跨层级传播特征，提升效率与精度。
引入光流正则化模块，通过特征驱动的局部卷积抑制异常值并提升边界一致性。
通过学习到的特征描述符与相关层结合，构建混合数据保真度项，增强点对应关系的鲁棒性。
采用轻量级架构，编码器中共享权重，并使用高效模块（如分数步长卷积）以最小化参数量与计算量。

实验结果

研究问题

RQ1通过显式建模变分方法中的数据保真度与正则化项，轻量级卷积神经网络能否实现最先进光流精度？
RQ2在深度光流网络中，特征扭曲与图像扭曲在精度与效率方面有何差异？
RQ3级联光流推理与特征驱动正则化对光流估计精度与鲁棒性有何影响？
RQ4更小、更快的网络是否能在速度与精度上超越更大的最先进模型（如 FlowNet2）？
RQ5经典变分方法的设计原则在多大程度上可有效迁移至现代基于卷积神经网络的光流估计中？

主要发现

LiteFlowNet2 在 Sintel Clean 基准测试中相比 LiteFlowNet 提升 23.3% 的精度，同时实现 2.2 倍的推理速度提升。
在 Sintel Final 上性能提升 12.8%，KITTI 2012 上提升 19.6%，KITTI 2015 上提升 18.8%，同时模型大小仅为 FlowNet2 的 1/25.3，推理速度提升 3.1 倍。
使用特征扭曲而非图像扭曲显著提升了效率，并实现了更优的金字塔层级间特征传播。
特征驱动正则化模块有效减少了异常值，尤其在纹理丰富与运动模糊区域显著提升了光流边界精度。
结合描述符匹配与亚像素精化的级联光流推理机制可实现早期校正与高精度光流估计。
该模型在精度与效率之间实现了良好平衡，适用于 SLAM、动作识别与三维重建等实时应用场景。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。