Abstract:Addressing the limitations of the LSTR algorithm in practical applications, particularly its single-scale feature extraction and lack of effective capture of local lane features, this paper introduces the Vit-CoMer backbone network for the first time in lane detection tasks, proposing the LSCoMer lane detection model. Initially, the model employs a MRFP module after the feature extraction network to enrich multi-scale features, thereby enhancing detection accuracy. Additionally, a CTI module is integrated at both the beginning and the end of the Transformer structure to promote effective fusion between CNN′s local features and Transformer′s global features, enhancing the latter’s sensitivity to local details. Experimental results indicate that this method achieves an accuracy of 96.68% on the TuSimple dataset, which is a 0.5% improvement over the original LSTR method and significantly outperforms similar methods like PolyLaneNet. On the CULane dataset, our method improves the F1 score by 3.02% compared to the LSTR method.