[论文解读] Learning Generalisable Omni-Scale Representations for Person Re-Identification
OSNet 通过动态聚合门控和实例归一化学习全尺度特征,在同域 re-ID 中达到最先进结果,并在没有目标域数据的情况下展现出强健的跨域泛化能力。
An effective person re-identification (re-ID) model should learn feature representations that are both discriminative, for distinguishing similar-looking people, and generalisable, for deployment across datasets without any adaptation. In this paper, we develop novel CNN architectures to address both challenges. First, we present a re-ID CNN termed omni-scale network (OSNet) to learn features that not only capture different spatial scales but also encapsulate a synergistic combination of multiple scales, namely omni-scale features. The basic building block consists of multiple convolutional streams, each detecting features at a certain scale. For omni-scale feature learning, a unified aggregation gate is introduced to dynamically fuse multi-scale features with channel-wise weights. OSNet is lightweight as its building blocks comprise factorised convolutions. Second, to improve generalisable feature learning, we introduce instance normalisation (IN) layers into OSNet to cope with cross-dataset discrepancies. Further, to determine the optimal placements of these IN layers in the architecture, we formulate an efficient differentiable architecture search algorithm. Extensive experiments show that, in the conventional same-dataset setting, OSNet achieves state-of-the-art performance, despite being much smaller than existing re-ID models. In the more challenging yet practical cross-dataset setting, OSNet beats most recent unsupervised domain adaptation methods without using any target data. Our code and models are released at exttt{https://github.com/KaiyangZhou/deep-person-reid}.
研究动机与目标
- Motivate discriminative and domain-generalisable feature learning for person re-ID.
- Propose omni-scale feature learning to capture both local and global cues with heterogeneous scales.
- Introduce a lightweight OSNet architecture using factorised convolutions for efficiency.
- Incorporate instance normalisation (IN) to reduce style/domain gaps and optimise IN placement via differentiable architecture search.
- Demonstrate state-of-the-art performance in same-domain re-ID and strong cross-domain generalisation without target data.
提出的方法
- Develop omni-scale network (OSNet) with multiple convolutional streams of increasing receptive field sizes.
- Introduce a unified aggregation gate (AG) to dynamically fuse multi-scale features channel-wise conditioned on input.
- Adopt depthwise separable (Lite 3x3) convolutions to keep OSNet lightweight (2.2M parameters).
- Incorporate instance normalisation (IN) into OSNet to address cross-dataset style differences.
- Formulate a differentiable neural architecture search (NAS) over a four-block IN-design space to determine optimal IN placements (OSNet-AIN).
实验结果
研究问题
- RQ1Can omni-scale feature learning improve discrimination for re-ID across diverse appearances?
- RQ2Does incorporating instance normalisation (IN) and NAS-guided IN placement enhance cross-dataset generalisation without target-domain data?
- RQ3How does OSNet compare to larger, standard architectures in both within-dataset and cross-dataset settings?
- RQ4Is a lightweight, multi-scale fusion approach effective for practical large-scale deployment?
主要发现
- OSNet achieves competitive to state-of-the-art results on large re-ID datasets with only 2.2M parameters (significantly smaller than ResNet50-based models).
- On big datasets, OSNet (our) attains Market1501 R1 93.6 and mAP 81.0; CUHK03 R1 57.1 and mAP 54.2; Duke R1 84.7 and mAP 68.6; MSMT17 R1 71.0 and mAP 43.3.
- OSNet variants with IN (OSNet-AIN) demonstrate strong cross-domain generalisation, beating many unsupervised domain adaptation methods without target data.
- Several comparative baselines (e.g., MobileNetV2, HAN, Auto-ReID) show OSNet’s competitiveness and efficiency, highlighting the benefits of omni-scale and AG fusion.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。