Abstract:In recent years, convolutional neural networks have achieved great success in the study of building extraction from remote sensing images, but they still face problems such as low overall extraction accuracy, misclassification, omission, and fuzzy boundaries. Aiming at the above problems, a building extraction method based on VDSEC-UNet for remote sensing images is proposed. Firstly, VGG-16 is used as the encoder to extract the building feature information. Secondly, dynamic up-sampling is used instead of traditional up-sampling to enhance the model′s ability to perceive the details so as to improve the extraction accuracy of the building boundaries. Next, a multi-scale context information extraction module is embedded in the middle of the coder and decoder in order to take the influence of other objects around the building into account and introduce sufficient context information and global information under different sensing fields to reduce the loss of spatial information and enhance the extraction effect of buildings at different scales. Then, the ECA attention mechanism is embedded in each jump connection part to improve the model′s attention to the building features in the image. At the same time, the joint loss function is used to alleviate the category imbalance problem. Finally, the CA-DPGHead module is constructed and added at the end of the decoder to enhance the distinction between buildings and background so that the model can locate and identify the building information in the image more accurately, which in turn improves the extraction accuracy of small buildings and refines the extraction effect of building boundaries. The experimental results show that the mIoU of VDSEC-UNet on Massachusetts and Inria datasets reaches 82.07% and 84.35%, respectively, and the F1 index reaches 83.34% and 86.66%, respectively, which is better than other classical methods.