Abstract:Accurate detection of steel surface defects is a critical aspect of industrial quality control. Especially in precision manufacturing fields such as mechanical engineering, automotive industry, electronics, aerospace, and artillery barrel production, surface quality directly determines the safety and reliability of end products. To address the limitations of existing steel surface defect detection methods, including insufficient multi-scale defect detection capability, high missed detection rates for small and low-contrast defects, and suboptimal bounding box regression accuracy, this article proposes an improved multi-scale steel surface defect detection method based on YOLOv11n. A multi-scale dynamic convolution module is designed, which employs parallel heterogeneous convolutions and a dynamic weight fusion mechanism to enhance the model′s ability to capture multi-scale defects. A dynamic residual fusion module is formulated, replacing the baseline C3K2 module with grouped convolution and a dual-residual structure. This significantly reduces the parameter count while improving multi-scale feature fusion and gradient flow efficiency, alleviating the degradation issue in deep network training. The deformable triple attention mechanism is enhanced by integrating deformable convolution and cross-dimensional interaction, enabling the attention receptive field to dynamically adjust according to defect morphology, thereby precisely focusing on small, low-contrast regions and suppressing complex background interference. The Shape-IoU loss function is adopted, which incorporates shape and scale factors to optimize bounding box regression accuracy, addressing the failure of the penalty term in traditional CIoU when the aspect ratios are identical. Experimental results on the NEU-DET dataset show that the improved model achieves an mAP@0.5 of 81.9%, representing a 6% improvement over the baseline YOLOv11n. The parameter count is only 2.3 M, and the computational cost is reduced to 5.9 GFLOPs, meeting the requirements for deployment on edge devices. Generalization experiments on the GC10-DET dataset show a 4.1% improvement over the baseline model. Visualization analysis and generalization experiments further validate its robustness and practicality in complex industrial scenarios.