Abstract:In response to the issues of complex details loss, insufficient multi-scale perception, low computational efficiency, and low detection accuracy in YOLO11 for classroom behavior detection, an improved ATDW-YOLO algorithm is proposed. Firstly, an Adaptive Polarized Feature Fusion module is constructed in the neck network to im-prove feature semantic fusion capabilities and better capture complex details. Secondly, a task dynamic align detection head module is designed to enhance the model′s recognition ability across multi-scale targets. Subsequently, a dynamic group convolution shuffle transformer module is introduced into the back-bone network to improve feature representation and achieve network lightweight. Finally, the Wise-IoU function replaces the CIoU loss function to improve the bounding box fitting capability and detection accuracy. Experimental results demonstrate that compared to the YOLO11n model, ATDW-YOLO improves mAP0.5 and mAP0.5:0.95 by 3.1% and 4.0%, respectively, while reducing model parameters, computational complexity, and model size by 21.6%, 7.4%, and 20.6%, respectively, significantly enhancing detection accuracy and achieving model lightweight.