QUICK REVIEW

[论文解读] Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

Mohammed Salah, Eman Ouda|arXiv (Cornell University)|Mar 11, 2026

Thermography and Photoacoustic Techniques被引用 0

一句话总结

本论文提出一种使用主动赤外热像与视觉-文本模型的 CFRP 零-shot 缺陷定位框架，通过一个 AIRT-VLM 适配器，将热像数据映射到与 VLM 兼容的域对齐图像，以实现缺陷定位且无需针对缺陷的训练。

ABSTRACT

Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.

研究动机与目标

Motivate zero-shot cognitive defect analysis in CFRP using AIRT to reduce the need for large labeled datasets and extensive training.
Bridge thermographic data with pretrained vision–language models by introducing an AIRT-VLM adapter.
Enable reliable subsurface defect localization and grounding without defect-specific training across varying energy-impact scenarios.

提出的方法

Standardize AIRT inspection sequences to normalize temporal-spatial dynamics.
Propose AIRT-VLM Adapter: a masked autoencoder that compresses a thermographic sequence into a single domain-aligned high-SNR image, T, via latent vectors and global pooling.
Use domain-aligned image I_VLM as input to off-the-shelf VLMs ( CogVLM, Qwen-VL-Chat, GroundingDINO ) to predict defect bounding boxes under a fixed textual prompt.
Train the autoencoder with a reconstruction loss to avoid trivial identity mapping and to emphasize defect-relevant cues.
Evaluate contrast and SNR improvements of I_VLM versus raw thermograms and compare defect grounding performance using IoU and normalized center distance (NCD).
Perform ablations on pooling strategies (average vs max vs PCA) and on single-image vs multi-image VLM inputs to assess trade-offs between performance and speed.

Figure 1: Front-side view of the impacted specimens, subjected to low-velocity impact at 5 J and 15 J.

实验结果

研究问题

RQ1Can pretrained multimodal vision–language models localize subsurface CFRP defects in AIRT without defect-specific training?
RQ2How does the AIRT-VLM adapter bridge the domain gap between thermograms and natural images to improve defect grounding?
RQ3What is the impact of pooling strategies and latent-space usage on zero-shot defect detection performance?
RQ4What are the detection performance metrics (IoU, NCD) achieved by different VLMs when coupled with the AIRT-VLM adapter?
RQ5What are the limitations of a purely domain-aligned, zero-shot approach for defect characterization in CFRP?

主要发现

The AIRT-VLM adapter yields substantial signal enhancements with contrast increases and SNR gains over raw thermograms and several baselines.
Zero-shot grounding with Qwen-VL, CogVLM, and GroundingDINO achieves IoUs around 70% and NCD ≈ 0.015 on average across sequences.
Compared to traditional dimensionality-reduction baselines, the proposed adapter provides sharper defect boundaries and better suppression of background artifacts.
Pooling the latent space via average pooling maintains competitive IoU and significantly improves computational efficiency over max pooling and PCA.
Using all latent images with NMS yields similar accuracy but with substantially higher execution time, justifying average pooling for practical use.
The framework enables defect grounding without defect-specific datasets, though it cannot perform depth estimation or defect-type classification.

Figure 2: Top: Front-side heating with pulsed flash lamps (A.1-A.2) and front-side long-pulse halogen heating (A.3). Bottom: Schematic representation of inspection setup.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。