QUICK REVIEW

[论文解读] Computer Stereo Vision for Autonomous Driving

Rui Fan, Li Wang|arXiv (Cornell University)|Dec 6, 2020

Advanced Vision and Imaging参考文献 84被引用 28

一句话总结

本文提出了一套全面的自动驾驶计算机立体视觉框架，通过整合多线程CPU与GPU架构，优化速度与精度之间的权衡。详细阐述了用于三维场景重建、目标检测和语义分割的立体视觉流水线，利用并行计算实现在嵌入式系统上的实时性能。

ABSTRACT

As an important component of autonomous systems, autonomous car perception has had a big leap with recent advances in parallel computing architectures. With the use of tiny but full-feature embedded supercomputers, computer stereo vision has been prevalently applied in autonomous cars for depth perception. The two key aspects of computer stereo vision are speed and accuracy. They are both desirable but conflicting properties, as the algorithms with better disparity accuracy usually have higher computational complexity. Therefore, the main aim of developing a computer stereo vision algorithm for resource-limited hardware is to improve the trade-off between speed and accuracy. In this chapter, we introduce both the hardware and software aspects of computer stereo vision for autonomous car systems. Then, we discuss four autonomous car perception tasks, including 1) visual feature detection, description and matching, 2) 3D information acquisition, 3) object detection/recognition and 4) semantic image segmentation. The principles of computer stereo vision and parallel computing on multi-threading CPU and GPU architectures are then detailed.

研究动机与目标

为解决资源受限的自动驾驶车辆中立体视觉的速度与精度平衡这一关键挑战。
将硬件（摄像头、LIDAR、GPS/IMU）与软件（感知、规划、控制）组件整合为一个统一的自动驾驶汽车系统。
通过多线程CPU与GPU上的高效并行计算，实现实时三维感知。
系统性地概述立体视觉算法及其在异构计算平台上的实现。

提出的方法

利用立体摄像头对通过本质矩阵几何与视差计算提取深度信息。
采用OpenMP在CPU上实现多线程，以并行化感知流水线中的串行处理任务。
利用CUDA C的GPU架构执行大规模并行计算，尤其适用于卷积神经网络（CNN）的训练与推理。
应用流式多处理器（SMs）与线程束（每组32个线程）以最大化GPU在立体匹配与特征提取中的吞吐量。
使用片上存储器（共享、寄存器、常量、纹理）优化数据访问，降低GPU内核的延迟。
融合LIDAR、雷达、GPS与IMU的传感器数据，以增强在恶劣环境条件下的鲁棒性。

实验结果

研究问题

RQ1如何在嵌入式系统上实现高精度的同时保持立体视觉的实时性能？
RQ2多线程CPU与GPU在加速立体视觉流水线中分别发挥什么作用？
RQ3不同传感器模态（摄像头、LIDAR、雷达）如何共同提升自动驾驶中三维感知的鲁棒性？
RQ4在立体视觉中，哪些关键的架构与算法选择能够优化速度-精度权衡？
RQ5如何有效利用异构计算（CPU + GPU）支持自动驾驶车辆的感知任务？

主要发现

通过OpenMP实现的CPU多线程与通过CUDA实现的GPU并行计算相结合，可在嵌入式系统上高效执行立体视觉算法。
基于GPU的处理显著加速了计算密集型任务，如CNN训练与视差计算，实现了高吞吐量，满足实时感知需求。
立体视觉能够实现精确的三维信息获取，其性能通过融合LIDAR与雷达数据得到进一步提升。
在GPU上使用片上存储器（共享、常量、纹理）可降低内存延迟，提升内核效率。
该框架支持四项核心感知任务：视觉特征匹配、三维重建、目标检测与语义分割。
硬件-软件协同设计，包括线控系统与传感器融合，实现了可靠自动驾驶车辆的控制与导航。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。