Abstract:Aiming at the existing problems of landslide semantic segmentation network of remote sensing image, such as large number of model parameters, slow training speed, fuzzy recognition of landslide boundary region, and differentiation of multi-scale semantic information classification of remote sensing image, this paper proposes an improved lightweight semantic segmentation model of RTformer. The cavity convolution attention ASPP module and channel attention SE module were embedded among the modules at different levels of the model to capture semantic information at different scales and to enhance the feature representation ability and improve the feature extraction ability of the model, making it more suitable for landslide remote sensing image recognition. Cityscapes data set was used to conduct comparative experiments on the expansion rate setting of the cavity convolution in the model and different batch sizes to obtain the optimal solution. A self-supervised training task was designed using the Bijie landslide disaster data set as the pre-training data set, and the model was fine-tuned and the segmentation performance of the model against the landslide disaster remote sensing images was tested. The resulting model achieved the best performance on both Cityscapes dataset and Bijie landslide disaster dataset. Compared with the original RTformer model, the mean crossover ratio (mIOU) of the two datasets increased by 2.26% and 4.34%, respectively. Compared with the classical semantic segmentation models such as FCN, U-Net, DeeplabV3 and SegFormer, the improved model realizes the recognition task with the fewest parameters and the fastest reasoning speed, and achieves the optimal segmentation effect.