QUICK REVIEW

[論文レビュー] A Computer Vision Enabled damage detection model with improved YOLOv5 based on Transformer Prediction Head

Arunabha M. Roy, Jayabrata Bhaduri|arXiv (Cornell University)|Mar 7, 2023

Infrastructure Maintenance and Monitoring被引用数 22

ひとこと要約

DenseSPH-YOLOv5 は DenseNet blocks、CBAM、追加の極小オブジェクトヘッド、および Swin Transformer Prediction Heads を YOLOv5 に統合し、RDD-2018 で高精度かつリアルタイムの道路損傷検出を実現します。

ABSTRACT

Objective:Computer vision-based up-to-date accurate damage classification and localization are of decisive importance for infrastructure monitoring, safety, and the serviceability of civil infrastructure. Current state-of-the-art deep learning (DL)-based damage detection models, however, often lack superior feature extraction capability in complex and noisy environments, limiting the development of accurate and reliable object distinction. Method: To this end, we present DenseSPH-YOLOv5, a real-time DL-based high-performance damage detection model where DenseNet blocks have been integrated with the backbone to improve in preserving and reusing critical feature information. Additionally, convolutional block attention modules (CBAM) have been implemented to improve attention performance mechanisms for strong and discriminating deep spatial feature extraction that results in superior detection under various challenging environments. Moreover, additional feature fusion layers and a Swin-Transformer Prediction Head (SPH) have been added leveraging advanced self-attention mechanism for more efficient detection of multiscale object sizes and simultaneously reducing the computational complexity. Results: Evaluating the model performance in large-scale Road Damage Dataset (RDD-2018), at a detection rate of 62.4 FPS, DenseSPH-YOLOv5 obtains a mean average precision (mAP) value of 85.25 %, F1-score of 81.18 %, and precision (P) value of 89.51 % outperforming current state-of-the-art models. Significance: The present research provides an effective and efficient damage localization model addressing the shortcoming of existing DL-based damage detection models by providing highly accurate localized bounding box prediction. Current work constitutes a step towards an accurate and robust automated damage detection system in real-time in-field applications.

研究の動機と目的

難しい環境における道路損傷の検出精度と局在化を改善する。
YOLOv5 で意味的喪失を緩和するために識別的特徴情報を保持・再利用する。
現場検査に適したリアルタイム性能を達成する。
効率的な特徴融合と注意機構で多段階の損傷を局在化する。

提案手法

DenseNet blocks を CSPDarknet53 に付加して特徴マップを保持し再利用を可能にする。
CBAM を組み込み、チャネルと空間の特徴マップを洗練させ、注意を高める。
小さなオブジェクト検出ヘッドを追加して小さな損傷検出を向上させる。
CNN ヘッドを Swin Transformer Prediction Heads に置換して自己注意を活用し、マルチスケールのオブジェクトを捉える。
Spatial Pyramid Pooling (SPP) をバックボーンに付加してマルチスケールの受容野を確保し、マルチスケール特徴融合のために improved PANet を使用する。
CIoU ベースの損失（IoU、重なり、アスペクト比項）と DIoU の考慮を用いた境界ボックス回帰を適用し、最終予測には NMS を適用する。

Figure 1 : Sample images from RDD-2018 dataset ( Maeda et al., , 2018 ) : (a) to (g) correspond to each of the eight categories with the legends.

実験結果

リサーチクエスチョン

RQ1 DenseNet を追加した CSPDarknet53 は、標準の YOLOv5 より特徴の保持と検出精度を向上させるか？道路損傷データで。
RQ2 CBAM の統合はノイズの多い環境、複数物体、様々な照明条件で検出性能を向上させるか？
RQ3 専用の tiny-object detection head と Swin Transformer Prediction Heads の追加は、多段階の損傷局在化と速度にどのような影響を与えるか？
RQ4 SP P と改良版 PANet は 8 種類の損傷クラス全体で文脈的特徴表現と局在精度にどう影響するか？
RQ5 現実世界条件下で RDD-2018 データセットに対するモデルの性能指標（mAP、精度、F1、IoU、FPS）はどの程度か？

主な発見

DenseSPH-YOLOv5 は RDD-2018 で 85.25% mAP、62.4 FPS を達成。
F1 スコア 81.18%、適合率 89.51% を報告し、検出精度と信頼性の高い局在化を示す。
DenseNet blocks と CSP の強化は特徴の保持と再利用を改善し、検出性能を向上させる。
CBAM は混雑した/密なシーンでの注意を改善し、損傷タイプの識別を助ける。
Swin Transformer Prediction Heads はマルチスケールの物体検出を強化しつつ計算コストを削減する。
SPP と改良版 PANet はさらにマルチスケールの特徴表現と局在化を豊かにする。

Figure 2 : Schematic of (a) YOLO object localization process for damage localization; (b) Schematic of CIoU offset regression for target BBs predictions.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。