Abstract:To address the challenges of complex background, dense small target distribution, and diverse target scales in remote sensing images, an improved algorithm based on YOLO11 was proposed. First, ghost convolution was introduced into the backbone network to significantly reduce computational load and parameter count while maintaining model performance. Second, a hybrid network module was designed in the backbone, incorporating three types of modules to enrich information flow and enhance feature extraction capabilities. Finally, a semantic-aware cross-layer multi-feature fusion approach was adopted, replacing the original P5 layer with the P2 layer to strengthen multi-scale feature fusion capabilities, effectively improving detection accuracy and mitigating the difficulty in extracting small target features from remote sensing images. Experimental results on the VisDrone2019, AI-TOD and RSOD datasets demonstrated that the improved YOLO11s model achieved mAP50 increases of 2.8%, 5.1% and 5.3% respectively compared to the original YOLO11s, with a 31% reduction in parameters, validating the effectiveness of the new algorithm.