[论文解读] Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues
本论文提出一种使用主动赤外热像与视觉-文本模型的 CFRP 零-shot 缺陷定位框架,通过一个 AIRT-VLM 适配器,将热像数据映射到与 VLM 兼容的域对齐图像,以实现缺陷定位且无需针对缺陷的训练。
Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.
研究动机与目标
- Motivate zero-shot cognitive defect analysis in CFRP using AIRT to reduce the need for large labeled datasets and extensive training.
- Bridge thermographic data with pretrained vision–language models by introducing an AIRT-VLM adapter.
- Enable reliable subsurface defect localization and grounding without defect-specific training across varying energy-impact scenarios.
提出的方法
- Standardize AIRT inspection sequences to normalize temporal-spatial dynamics.
- Propose AIRT-VLM Adapter: a masked autoencoder that compresses a thermographic sequence into a single domain-aligned high-SNR image, T, via latent vectors and global pooling.
- Use domain-aligned image I_VLM as input to off-the-shelf VLMs ( CogVLM, Qwen-VL-Chat, GroundingDINO ) to predict defect bounding boxes under a fixed textual prompt.
- Train the autoencoder with a reconstruction loss to avoid trivial identity mapping and to emphasize defect-relevant cues.
- Evaluate contrast and SNR improvements of I_VLM versus raw thermograms and compare defect grounding performance using IoU and normalized center distance (NCD).
- Perform ablations on pooling strategies (average vs max vs PCA) and on single-image vs multi-image VLM inputs to assess trade-offs between performance and speed.

实验结果
研究问题
- RQ1Can pretrained multimodal vision–language models localize subsurface CFRP defects in AIRT without defect-specific training?
- RQ2How does the AIRT-VLM adapter bridge the domain gap between thermograms and natural images to improve defect grounding?
- RQ3What is the impact of pooling strategies and latent-space usage on zero-shot defect detection performance?
- RQ4What are the detection performance metrics (IoU, NCD) achieved by different VLMs when coupled with the AIRT-VLM adapter?
- RQ5What are the limitations of a purely domain-aligned, zero-shot approach for defect characterization in CFRP?
主要发现
- The AIRT-VLM adapter yields substantial signal enhancements with contrast increases and SNR gains over raw thermograms and several baselines.
- Zero-shot grounding with Qwen-VL, CogVLM, and GroundingDINO achieves IoUs around 70% and NCD ≈ 0.015 on average across sequences.
- Compared to traditional dimensionality-reduction baselines, the proposed adapter provides sharper defect boundaries and better suppression of background artifacts.
- Pooling the latent space via average pooling maintains competitive IoU and significantly improves computational efficiency over max pooling and PCA.
- Using all latent images with NMS yields similar accuracy but with substantially higher execution time, justifying average pooling for practical use.
- The framework enables defect grounding without defect-specific datasets, though it cannot perform depth estimation or defect-type classification.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。