Abstract:To tackle the performance decline of models detecting rail surface defects due to the similarity between the characteristics of defect areas and background areas, this paper explores the high real-time, lightweight object detection network YOLOv8n and proposes a multi-modal rail surface defect detection algorithm, named RailBiModal-YOLO. Improvements to the YOLOv8n model involve the construction of a dual-stream backbone network structure that allows for the parallel extraction of multi-scale depth and RGB information; a plug-and-play dual-modal feature interaction and revision fusion module is designed to minimize the interference of low-quality image features and to fully leverage the complementary information from both modalities; the EVCBlock is introduced during the multi-scale feature construction phase to enhance the intra-layer information interaction within the RGB-D feature layers, thereby improving the detection of small defects. The Northeastern University NEU-RSDDS-AUG dataset is utilized for experiments, which has been custom-divided into four typical defect types, with mean average precision (mAP), frames per second (FPS), and the number of parameters serving as the primary evaluation metrics. It is demonstrated by the experimental results that the proposed model, in comparison to the original model, not only maintains high detection speed but also achieves enhancements in mAP@50 and mAP@50:95 by 1.8% and 3.2%, respectively, along with exhibiting increased robustness.