Abstract:Real-time high-precision segmentation of complex urban street scenes is crucial for autonomous driving. Aiming at the problems that existing real-time semantic segmentation networks have insufficient capture of spatial information and detailed features in high-resolution branches, as well as inefficient fusion of high and low-resolution features leading to information loss, which restricts the improvement of segmentation accuracy, this paper proposes a real-time semantic segmentation network based on multi-scale partial dilated convolution and boundary collaborative dual attention guided fusion(MPDANet).First, a Multi-Scale Partial Dilated Convolution Module (MSPDC) is designed. By using partial dilated convolutions with parallel ladder-type dilation rates, it efficiently captures detailed features andspatial information of high-resolution branches from different scales, addressing the problem of insufficient information capture.Second, an Attention-Guided Feature Pyramid Module (AFPM) is constructed. It extracts multi-scale semantic information from low-resolution branches through an asymmetric pooling layer and further enhances the semantic information by combining a pixel attention mechanism.Finally, a Boundary Collaborative Dual Attention Fusion Module (BCDAF) is proposed. It screens key semantic and spatial information through parallel channel-spatial attention, suppresses information loss caused by cross-resolution feature fusion, and introduces boundary attention to improve the segmentation effect of target boundaries.On the Cityscapes validation set, the proposed network achieves 78.6% mIoU at a speed of 295 fps; on the CamVid test set, it achieves 77.4% mIoU at a speed of 454 fps. Experimental results show that the network proposed in this paper achieves high-precision segmentation of complex urban street scenes while maintaining real-time performance.