Abstract:To address the challenges of small and densely packed targets in drone aerial images, which are prone to missed and false detections, this paper proposes an improved multi-scale target detection model, UCM-YOLOv8, based on YOLOv8n, for complex backgrounds in drone aerial photography.Initially, a pyramid network structure that integrates aggregation and diffusion mechanisms is designed, enabling features at each scale to capture detailed contextual information. Second, a task dynamic alignment detection head is introduced to learn interactive features from multiple convolutional layers, enhancing detection precision. Furthermore, the effective integration of the convolutional additive self-attention mechanism with the C2f module further strengthens the network′s feature representation capacity. Finally, the Wise-Inner loss function is employed to replace the original CIoU loss function, suppressing harmful gradients caused by low-resolution images.The proposed model was validated through comparative and ablation experiments on the VisDrone2019 dataset. Results show a 10.8% improvement in mAP50 over the baseline model and a 9.6% reduction in parameters. These findings demonstrate the model′s superior performance in detecting small targets from drone perspectives, making it well-suited for drone aerial image applications.