QUICK REVIEW

[论文解读] NoScope: Optimizing Neural Network Queries over Video at Scale

Daniel Kang, John Emmons|arXiv (Cornell University)|Mar 7, 2017

Advanced Neural Network Applications参考文献 86被引用 47

一句话总结

NoScope 通过自动搜索并训练针对特定视频和目标类别定制的专用模型级联与差异检测器，加速了视频中的神经网络推理，与实时相比最高可实现 15,500 倍的加速，同时仅损失 1–5% 的精度，与当前最先进网络相比表现优异。

ABSTRACT

Recent advances in computer vision-in the form of deep neural networks-have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NoScope, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive. NoScope cascades two types of models: specialized models that forego the full generality of the reference model but faithfully mimic its behavior for the target video and object; and difference detectors that highlight temporal differences across frames. We show that the optimal cascade architecture differs across videos and objects, so NoScope uses an efficient cost-based optimizer to search across models and cascades. With this approach, NoScope achieves two to three order of magnitude speed-ups (265-15,500x real-time) on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1-5% of state-of-the-art neural networks.

研究动机与目标

解决在大规模场景下运行当前最先进深度神经网络（DNN）进行视频目标检测所面临的高计算成本问题。
通过利用固定角度视频流中的查询特定模式，降低神经网络推理在视频上的成本和延迟。
开发一个自动化系统，用于搜索并训练针对特定视频和目标类别的优化模型级联。
在通过模型专业化和时间差异检测显著提升推理速度的同时，将精度保持在参考模型的 1–5% 以内。

提出的方法

使用预训练的参考 DNN 为目标视频和目标类别生成用于模型专业化的标注训练数据。
训练轻量级、专用的 DNN，使其在目标视频上模仿参考模型的行为，但针对速度和复杂度进行了优化。
部署差异检测器以识别连续帧之间的时间变化，从而在帧几乎相同时跳过昂贵的推理计算。
构建模型级联结构，先应用专用网络，再通过差异检测器处理，仅在置信度较低时调用参考 DNN。
采用基于成本的优化器，在多种模型架构和置信度阈值中搜索，以在指定精度约束下最大化吞吐量。
使用知识蒸馏技术，将完整参考模型的知识迁移至更小、更快的专用模型中。

实验结果

研究问题

RQ1我们能否在几乎不损失精度的前提下，将神经网络在视频上的推理成本降低数个数量级？
RQ2我们如何能自动识别并利用视频特定的模式（如有限的目标视角和时间冗余性）来加速推理？
RQ3对于给定的视频和目标类别，专用模型与差异检测器的最优级联架构是什么？
RQ4在真实世界的视频工作负载中，模型专业化与时间差异检测在多大程度上能协同提升推理效率？

主要发现

NoScope 在固定角度网络摄像头和监控视频上实现了 265 倍至 15,500 倍的推理速度提升，同时精度保持在参考模型的 1–5% 以内。
该系统将计算成本降低了多达三个数量级，使得在通用硬件上实现大规模视频分析与深度学习成为可能。
仅使用专用模型，其速度最高可达完整参考网络的 340 倍，显著减少了对原始模型的调用需求。
差异检测器能有效识别时间上冗余的帧，某些情况下将需昂贵推理的帧数减少高达 80%。
基于成本的优化器成功识别出多种不同视频和目标类别下的最优模型级联，能够自适应调整架构和置信度阈值以最大化效率。
当目标视频和目标类别已知时，模型专业化在推理速度上优于通用模型压缩技术（如知识蒸馏或剪枝）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。