Abstract:In the application scenarios of industrial robot automation, the existing target detection algorithms have problems such as low detection accuracy when dealing with targets with large scale variations, poor occlusion processing effect and insufficient real-time performance. This paper designs and proposes the YOLOV11n-RLW algorithm based on the YOLOv11n benchmark model. Specific improvements include: Adopting the RepViT backbone network to replace the traditional feature extraction network, enhancing the feature extraction capability; incorporate the LA-CBAM attention mechanism to address the issue of the lack of spatial features in the SE module and enhance multi-scale feature fusion; replace CIoU with the Wise-IoU loss function to improve the regression accuracy. On the VisDrone2019 and KITTI datasets, this model achieved a 38.4% mAP50 at 260 fps, with only 2.24 M of parameters. Compared with the benchmark model, the real-time performance is improved by 6%, the recognition rate is increased by 5%, and the number of parameters is reduced by 13.6%. This algorithm effectively solves the problems of multi-scale target detection, occlusion processing and insufficient real-time performance. It meets the requirements of industrial scenarios for detection speed and accuracy, and is suitable for the engineering application of high-precision industrial robot target detection systems.