QUICK REVIEW

[论文解读] Tree-Structured Reinforcement Learning for Sequential Object Localization

Zequn Jie, Xiaodan Liang|arXiv (Cornell University)|Mar 8, 2017

Robotics and Sensor-Based Localization参考文献 24被引用 85

一句话总结

树结构强化学习通过自顶向下在窗口中逐步探索来定位多个对象，在细化与发现之间取得平衡，以较少的候选区域提高召回率。它使用双动作树（缩放与平移）和深度Q学习，在VOC数据集上优化多对象定位。

ABSTRACT

Existing object proposal algorithms usually search for possible object regions over multiple locations and scales separately, which ignore the interdependency among different objects and deviate from the human perception procedure. To incorporate global interdependency between objects into object localization, we propose an effective Tree-structured Reinforcement Learning (Tree-RL) approach to sequentially search for objects by fully exploiting both the current observation and historical search paths. The Tree-RL approach learns multiple searching policies through maximizing the long-term reward that reflects localization accuracies over all the objects. Starting with taking the entire image as a proposal, the Tree-RL approach allows the agent to sequentially discover multiple objects via a tree-structured traversing scheme. Allowing multiple near-optimal policies, Tree-RL offers more diversity in search paths and is able to find multiple objects with a single feed-forward pass. Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal. Experiments on PASCAL VOC 2007 and 2012 validate the effectiveness of the Tree-RL, which can achieve comparable recalls with current object proposal algorithms via much fewer candidate windows.

研究动机与目标

通过利用全局的对象间依赖来模拟人类的场景理解，激励减少候选区域窗口。
提出一个树结构的RL框架，从整个图像中按顺序定位多个对象。
设计一个奖励机制，在细化已关注对象与发现新对象之间取得平衡。
使用深度Q学习进行训练，以学习在跨对象的长期定位准确性最大化的策略。
证明Tree-RL在候选区域更少的情况下实现具有竞争力的召回率，并在与Fast R-CNN结合时改进定位与检测。

提出的方法

将对象定位建模为一个具有两组动作的马尔可夫决策过程：缩放到子窗口和对当前窗口进行平移。
将状态构造成当前窗口特征、全局图像特征和动作历史的串联。
使用深度Q网络估计动作价值，通过经验回放和ε-greedy探索进行训练。
采用树结构搜索，在每个状态从每组中选择最佳动作，创建两个下一个窗口，从而实现多条接近最优的搜索路径。
设计基于与真实框IoU改进的奖励r(s,a)，包括首次命中奖励(+5)以及IoU改进的+1/-1二值信号，此外当首次达到IoU>0.5时再获得+5奖励。
在VOC 2007+2012 trainval上进行25轮训练，ε从1降至0.1，γ=0.9，每轮50步，使用大型回放记忆进行Q学习更新。

实验结果

研究问题

RQ1相比单一路径的强化学习和传统候选区域方法，具有双动作的树结构自顶向下搜索是否能在较少的候选区域下提高召回率？
RQ2将全局图像信息上下文与动作历史结合是否能够在VOC数据集上实现跨尺度的更好多对象定位？
RQ3所提出的奖励设计如何影响对新对象的探索与对已发现对象的细化？
RQ4将Tree-RL与Fast R-CNN结合对下游检测器性能有何影响？

主要发现

Tree-RL在VOC 2007上实现的召回率与RPN相当，但候选区域显著更少。
将Tree-RL与Fast R-CNN（ResNet-101）结合时，定位准确性高于RPN。
在大多数设置下，Tree-RL优于单一的最优搜索路径RL，尤其是针对大型对象。
随着树层数的增加，Tree-RL的召回率提升，表明对不同尺度对象的覆盖更好。
在63步时，大型对象的Tree-RL召回率在IoU=0.5达到78.9%，IoU=0.6达到69.8%，IoU=0.7达到53.3%，这些数值来自VOC07的示例表。
在使用基于VGG-16的候选区域时，Tree-RL在VOC07/12上的检测mAP与Faster R-CNN基线具有竞争力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。