Skip to main content
QUICK REVIEW

[论文解读] Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

Mohammed Salah, Eman Ouda|arXiv (Cornell University)|Mar 11, 2026
Thermography and Photoacoustic Techniques被引用 0
一句话总结

本论文提出一种使用主动赤外热像与视觉-文本模型的 CFRP 零-shot 缺陷定位框架,通过一个 AIRT-VLM 适配器,将热像数据映射到与 VLM 兼容的域对齐图像,以实现缺陷定位且无需针对缺陷的训练。

ABSTRACT

Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.

研究动机与目标

  • Motivate zero-shot cognitive defect analysis in CFRP using AIRT to reduce the need for large labeled datasets and extensive training.
  • Bridge thermographic data with pretrained vision–language models by introducing an AIRT-VLM adapter.
  • Enable reliable subsurface defect localization and grounding without defect-specific training across varying energy-impact scenarios.

提出的方法

  • Standardize AIRT inspection sequences to normalize temporal-spatial dynamics.
  • Propose AIRT-VLM Adapter: a masked autoencoder that compresses a thermographic sequence into a single domain-aligned high-SNR image, T, via latent vectors and global pooling.
  • Use domain-aligned image I_VLM as input to off-the-shelf VLMs ( CogVLM, Qwen-VL-Chat, GroundingDINO ) to predict defect bounding boxes under a fixed textual prompt.
  • Train the autoencoder with a reconstruction loss to avoid trivial identity mapping and to emphasize defect-relevant cues.
  • Evaluate contrast and SNR improvements of I_VLM versus raw thermograms and compare defect grounding performance using IoU and normalized center distance (NCD).
  • Perform ablations on pooling strategies (average vs max vs PCA) and on single-image vs multi-image VLM inputs to assess trade-offs between performance and speed.
Figure 1: Front-side view of the impacted specimens, subjected to low-velocity impact at 5 J and 15 J.
Figure 1: Front-side view of the impacted specimens, subjected to low-velocity impact at 5 J and 15 J.

实验结果

研究问题

  • RQ1Can pretrained multimodal vision–language models localize subsurface CFRP defects in AIRT without defect-specific training?
  • RQ2How does the AIRT-VLM adapter bridge the domain gap between thermograms and natural images to improve defect grounding?
  • RQ3What is the impact of pooling strategies and latent-space usage on zero-shot defect detection performance?
  • RQ4What are the detection performance metrics (IoU, NCD) achieved by different VLMs when coupled with the AIRT-VLM adapter?
  • RQ5What are the limitations of a purely domain-aligned, zero-shot approach for defect characterization in CFRP?

主要发现

  • The AIRT-VLM adapter yields substantial signal enhancements with contrast increases and SNR gains over raw thermograms and several baselines.
  • Zero-shot grounding with Qwen-VL, CogVLM, and GroundingDINO achieves IoUs around 70% and NCD ≈ 0.015 on average across sequences.
  • Compared to traditional dimensionality-reduction baselines, the proposed adapter provides sharper defect boundaries and better suppression of background artifacts.
  • Pooling the latent space via average pooling maintains competitive IoU and significantly improves computational efficiency over max pooling and PCA.
  • Using all latent images with NMS yields similar accuracy but with substantially higher execution time, justifying average pooling for practical use.
  • The framework enables defect grounding without defect-specific datasets, though it cannot perform depth estimation or defect-type classification.
Figure 2: Top: Front-side heating with pulsed flash lamps (A.1-A.2) and front-side long-pulse halogen heating (A.3). Bottom: Schematic representation of inspection setup.
Figure 2: Top: Front-side heating with pulsed flash lamps (A.1-A.2) and front-side long-pulse halogen heating (A.3). Bottom: Schematic representation of inspection setup.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。