QUICK REVIEW

[論文レビュー] Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

Mohammed Salah, Eman Ouda|arXiv (Cornell University)|Mar 11, 2026

Thermography and Photoacoustic Techniques被引用数 0

ひとこと要約

要約: 論文は、AIRT-VLM アダプターを介して熱画像データをVLM対応のドメイン整列画像へマッピングし、欠陥特化のトレーニングなしで欠陥の grounding を実現する、CFRP のアクティブ赤外線熱像とビジョン-テキストモデルを用いたゼロショット欠陥局在化フレームワークを提案します。

ABSTRACT

Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.

研究の動機と目的

AIRT を用いた CFRP におけるゼロショット知覚欠陥解析を動機づけ、ラベル付きデータセットの大規模収集と広範な訓練の必要性を低減する。
AIRT-VLM アダプタを導入することにより、熱画像データと事前学習済みの vision–language モデルを橋渡しする。
欠陥特化の訓練なしで、さまざまなエネルギー・衝撃シナリオに跨って信頼性のある地下欠陥局在化と grounding を可能にする。）

提案手法

AIRT 検査シーケンスを標準化して時空ダイナミクスを正規化する。
AIRT-VLM アダプタ：熱像シーケンスを単一のドメイン整列高SNR画像、T、に圧縮するマスクドオートエンコーダで、潜在ベクトルとグローバルプーリングを介して実現する。
ドメイン整列画像 I_VLM を入力として、市販の VLMs（ CogVLM, Qwen-VL-Chat, GroundingDINO ）を用い、固定されたテキストプロンプトの下で欠陥境界ボックスを予測する。
トラivial な同一性マッピングを避ける reconstructionLoss でオートエンコーダを訓練し、欠陥関連の手掛かりを強調する。
I_VLM の対比と SNR の改善を、生の熱画像と比較し、IoU と正規化中心距離（NCD）を用いて欠陥 grounding 性能を評価する。
性能と速度のトレードオフを評価するため、プーリング戦略（平均 vs 最大 vs PCA）および単一画像対多画像 VLM 入力のアブレーションを行う。

Figure 1: Front-side view of the impacted specimens, subjected to low-velocity impact at 5 J and 15 J.

実験結果

リサーチクエスチョン

RQ1事前訓練済みのマルチモーダル vision–language モデルは、欠陥特化の訓練なしで AIRT から CFRP の地下欠陥を局在できるか？
RQ2AIRT-VLM アダプタは熱画像と自然画像の間のドメインギャップをどのように橋渡しして欠陥 grounding を改善するのか？
RQ3潜在空間のプーリング戦略と使用方法がゼロショット欠陥検出性能に与える影響は？
RQ4AIRT-VLM アダプターと組み合わせたとき、異なる VLM が達成する検出性能指標（IoU, NCD）はどのようになるか？
RQ5CFRP における欠陥特性のディープの推定や欠陥タイプ分類を伴わない純粋にドメイン整列型ゼロショット方式の限界は何か？

主な発見

AIRT-VLM アダプタは、生の熱画像や複数ベースラインよりも信号強化、コントラスト増大、SNR 増加を大幅に実現する。
Qwen-VL、CogVLM、GroundingDINO を用いたゼロショット grounding は、IoU が約 70%、NCD ≈ 0.015（シーケンス全体の平均）程度を達成する。
従来の次元削減ベースラインと比較して、提案されたアダプタは欠陥境界をより鮮明にし、背景アーチファクトの抑制を改善する。
潜在空間を平均プーリングで処理すると IoU は競争力を維持しつつ、最大プーリングや PCA より計算効率が大幅に向上する。
全潜在画像を NMS で使用すると精度は同程度だが実行時間が大幅に増えるため、実用上は平均プーリングが正当化される。
このフレームワークは欠陥特定データセットなしで欠陥 grounding を可能にするが、深度推定や欠陥タイプ分類は行えない。

Figure 2: Top: Front-side heating with pulsed flash lamps (A.1-A.2) and front-side long-pulse halogen heating (A.3). Bottom: Schematic representation of inspection setup.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。