视觉Transformer在表面缺陷检测中的应用:研究进展与挑战
DOI:
CSTR:
作者:
作者单位:

南京航空航天大学电子信息工程学院南京211106

作者简介:

通讯作者:

中图分类号:

TP391.41TH89TP389.1

基金项目:


Applications of vision Transformer in surface defect detection: Research progress and challenges
Author:
Affiliation:

College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    传统卷积神经网络(CNN)受限于局部卷积操作,难以有效建模长程依赖关系;相比之下,视觉Transformer通过自注意力等机制实现了对全局依赖关系的显式建模。在表面缺陷检测任务中,尤其是在背景纹理复杂、缺陷形态多变等检测场景中,展现出优于CNN的检测性能。围绕视觉Transformer在表面缺陷检测中的技术优势与应用方法、面临的关键挑战及应对策略两大维度,综述了近年来国内外基于视觉Transformer的表面缺陷检测研究进展与挑战,为视觉Transformer在表面缺陷检测中的应用提供了理论依据与方法支撑。首先,阐释了表面缺陷检测的基本定义,归纳了该领域的技术特征与主要瓶颈。其次,深入剖析了视觉Transformer在缺陷检测任务中所具备的技术优势及其在实际应用中存在的关键挑战。然后,结合视觉Transformer的技术优势,重点分析了视觉Transformer在表面缺陷检测任务中的典型应用方向,包括应对复杂纹理背景干扰、实现多模态信息融合、基于模块化思想的局部-全局特征信息融合等应用场景。随后,探讨了视觉Transformer在面对表面缺陷检测任务中存在的样本量稀缺、模型计算复杂度高与实时性不足、训练效率低下以及小目标缺陷检测性能差等关键挑战时,所采用的主要优化策略与应对方法。最后,围绕迁移学习驱动的预训练视觉大模型构建、视觉Transformer与多模态的深度融合等方向,对视觉Transformer在表面缺陷检测领域的发展趋势进行了展望。

    Abstract:

    Convolutional neural network (CNN) have been limited in their ability to effectively model long-range dependencies due to their localized convolution operations. In contrast, vision Transformer achieves explicit modeling of global dependencies through mechanisms such as self-attention. In surface defect detection tasks, especially in scenarios with complex background textures or diverse defect morphologies, vision Transformer shows superior performance compared with CNN. This article provides a comprehensive review of recent domestic and international research progress and challenges in surface defect detection based on vision Transformer, focusing on two dimensions: The technical advantages and application methodologies, as well as key challenges and corresponding strategies. Firstly, the fundamental definition of surface defect detection is elucidated, and the technical characteristics and main challenges in this field are summarized. Secondly, the technical advantages and key challenges of the vision Transformer in the context of defect detection are analyzed. Subsequently, leveraging the technical strengths of vision Transformer, typical applications in surface defect detection tasks are examined in detail, including handling complex texture background interference, achieving multimodal information fusion, and integrating local-global feature information based on a modular design approach. Subsequently, the article discusses the main optimization strategies and solutions adopted by vision Transformer to address key challenges in surface defect detection, such as scarce sample data, high computational complexity, insufficient real-time performance, low training efficiency, and poor performance in detecting small targets. Finally, future research directions and development trends of vision Transformer in the field of surface defect detection are prospected, such as the development of transfer learning-based pre-trained models and their advanced fusion with multimodal methodologies, among others.

    参考文献
    相似文献
    引证文献
引用本文

杨洋,吴一全.视觉Transformer在表面缺陷检测中的应用:研究进展与挑战[J].仪器仪表学报,2025,46(12):1-22

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-03-02
  • 出版日期:
文章二维码