QUICK REVIEW

[论文解读] YOLOv6 v3.0: A Full-Scale Reloading

Chuyi Li, Lulu Li|arXiv (Cornell University)|Jan 13, 2023

Advanced Neural Network Applications被引用 192

一句话总结

YOLOv6 v3.0 对网络和训练策略进行了重新设计，采用基于 BiC 的颈部、锚点辅助训练、自蒸馏，以及扩展的骨干/颈部，以在多尺度上实现实时准确性的新状态，覆盖 N6/S6/M6/L6 变体。

ABSTRACT

The YOLO community has been in high spirits since our first two releases! By the advent of Chinese New Year 2023, which sees the Year of the Rabbit, we refurnish YOLOv6 with numerous novel enhancements on the network architecture and the training scheme. This release is identified as YOLOv6 v3.0. For a glimpse of performance, our YOLOv6-N hits 37.5% AP on the COCO dataset at a throughput of 1187 FPS tested with an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 45.0% AP at 484 FPS, outperforming other mainstream detectors at the same scale (YOLOv5-S, YOLOv8-S, YOLOX-S and PPYOLOE-S). Whereas, YOLOv6-M/L also achieve better accuracy performance (50.0%/52.8% respectively) than other detectors at a similar inference speed. Additionally, with an extended backbone and neck design, our YOLOv6-L6 achieves the state-of-the-art accuracy in real-time. Extensive experiments are carefully conducted to validate the effectiveness of each improving component. Our code is made available at https://github.com/meituan/YOLOv6.

研究动机与目标

Renew and enhance YOLOv6 with architectural and training improvements for better real-time object detection performance.
Achieve higher accuracy while maintaining or improving inference speed across small to large models.
Provide a unified framework with training-time auxiliaries that do not affect inference latency.
Validate improvements through extensive ablations and comparisons against YOLOv5/YOLOv7/YOLOv8 and peers.

提出的方法

Design an enhancedneck (BiC module) to fuse features from three adjacent layers for improved localization signals.
Introduce SimCSPSPPF to replace or simplify SPPF blocks while preserving representational power.
Adopt anchor-aided training (AAT) to combine anchor-based and anchor-free benefits during training, with auxiliary branches removed at inference.
Apply self-distillation (including a special Decoupled Localization Distillation for small models) to boost small-model performance without sacrificing speed.
Extend backbone and neck with an extra stage (C6 features) and higher input resolution to improve accuracy on high-res images.
Evaluate models with FP16 TensorRT on Tesla T4 to report AP, FPS, latency, and compare with YOLOv5/YOLOX/PPYOLOE/YOLOv7/YOLOv8.

实验结果

研究问题

RQ1How do architectural changes (BiC neck, SimCSPSPPF) affect localization and small-object accuracy?
RQ2Do anchor-based auxiliary branches during training improve detection performance without impacting inference speed?
RQ3Can self-distillation and a dedicated heavy regression branch enhance small-model performance without throughput loss?
RQ4What is the impact of adding a top-stage (C6) and higher input resolution on COCO AP at real-time speeds?

主要发现

YOLOv6-N/S/M/L achieve higher AP with competitive or superior speed compared to YOLOv5/YOLOv7/PPYOLOE/YOLov8 at similar scales.
Expanded YOLOv6-N6/S6/M6/L6 with higher input resolution (640→1280) and added C6 features yield state-of-the-art accuracy for real-time detectors.
BiC improves localization signals and small-object AP with minimal efficiency cost, especially on small models.
Anchor-Aided Training (AAT) provides additional AP gains across scales, notably improving APs on small objects.
Self-distillation, including a Decoupled Localization Distillation strategy for small models, yields AP gains without slowing inference.
Compared to state-of-the-art, YOLOv6-L6 outperforms YOLOv7-E6E by 0.4% AP and runs 63% faster at bs=1.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。