QUICK REVIEW

[Paper Review] MetaAnchor: Learning to Detect Objects with Customized Anchors

Tong Yang, Xiangyu Zhang|arXiv (Cornell University)|Jul 3, 2018

Advanced Neural Network Applications89 citations

TL;DR

MetaAnchor introduces a dynamic anchor function generator that maps customized prior boxes to anchor functions, improving robustness to anchor settings and bounding-box distributions and boosting COCO Detection performance over RetinaNet baselines.

ABSTRACT

We propose a novel and flexible anchor mechanism named MetaAnchor for object detection frameworks. Unlike many previous detectors model anchors via a predefined manner, in MetaAnchor anchor functions could be dynamically generated from the arbitrary customized prior boxes. Taking advantage of weight prediction, MetaAnchor is able to work with most of the anchor-based object detection systems such as RetinaNet. Compared with the predefined anchor scheme, we empirically find that MetaAnchor is more robust to anchor settings and bounding box distributions; in addition, it also shows the potential on transfer tasks. Our experiment on COCO detection task shows that MetaAnchor consistently outperforms the counterparts in various scenarios.

Motivation & Objective

Motivate flexible, robust anchors that are not fixed to a predefined set of priors.
Propose a mechanism to dynamically generate anchor functions from arbitrary prior boxes.
Show that weight-prediction-based anchor function generation improves detection robustness and transfer ability.
Demonstrate compatibility and gains across single-stage detectors like RetinaNet on COCO.

Proposed method

Introduce an anchor function generator G(bi; w) that maps a prior box bi to an anchor function Fi_bi.
Model Fi_bi as Fi_bi(x; θi) = Fi(x; θbi) with θbi = θ* + R(bi; w) where R is a small neural network.
Provide a data-independent and a data-dependent variant for G(·) to predict Fi’s parameters.
Represent prior bi using log-scale height/width ratios relative to a standard anchor box (AH, AW).
Apply MetaAnchor to RetinaNet by replacing fixed anchor heads with generators for cls and reg heads, sharing G(·) across levels with level-specific standard boxes.
Optionally augment training with random perturbations of bi to improve robustness.

Experimental results

Research questions

RQ1Can anchor functions be dynamically generated from arbitrary prior boxes rather than enumerated priors?
RQ2Does MetaAnchor improve robustness to anchor box distributions and transferability across datasets?
RQ3How do data-independent and data-dependent variants of the anchor function generator compare in performance?
RQ4What impact does flexible inference-time anchor configuration have on detection performance?
RQ5Can MetaAnchor be effectively integrated into existing single-stage detectors (e.g., RetinaNet) to improve detection metrics on COCO?

Key findings

MetaAnchor consistently outperforms RetinaNet baselines across multiple anchor configurations, e.g., mmAP gains of about 0.2–0.8% and AP50 gains of about 0.8–1.5%.
Using more anchors during training/inference generally improves performance for MetaAnchor, with diminishing returns beyond 7×7 or 9×9 configurations.
On COCO-full, MetaAnchor achieves 37.5% mmAP on minival, which is 1.7% better than the best RetinaNet implementation and 0.6% better than the best RetinaNet with a searched configuration; the data-dependent variant further improves by ~0.4%.
MetaAnchor shows stronger transfer ability than RetinaNet when transferring from COCO-full to VOC2007, with a notable reduction in performance degradation under distribution shifts.
A greedy search inference strategy further boosts MetaAnchor performance by selecting anchor configurations that yield score improvements during testing.
Data-dependent anchor function generators often perform slightly better than data-independent variants in several settings.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.