Abstract:To address the issues of low accuracy and poor robustness in existing behavior detection models caused by complex underground backgrounds, large variations in miner behavior scales, and frequent occlusions, an improved RT-DETR-based unsafe behavior detection method for miners is proposed. The proposed method constructs a backbone network, CANet, featuring multi-path feature extraction and a dual-branch downsampling structure. By effectively fusing deep and shallow features while preserving edge details, CANet enhances the model′s ability to perceive fine-grained behavior details in complex backgrounds.Meanwhile, a Diffusion-Aware Feature Pyramid Network (DAFPN) is designed by integrating a dimension-aware selective integration module with a cross-layer diffusion strategy, forming a two-stage fusion-diffusion mechanism to strengthen semantic interactions among multi-scale behavior features. This design significantly improves the model′s adaptability to diverse postures and large-scale variations.In addition, a variable kernel convolution module (AKConv) is introduced, which dynamically adjusts sampling positions to enable the network to focus adaptively on key behavior regions under occlusion, thereby enhancing the robustness of miner behavior detection.Experimental results show that the improved RT-DETR model achieves 92.9% mAP@0.5 and 66.1% mAP@0.5:0.95, improving by 2.9% and 1.9% over the original model, while reducing parameters by 18% and computational cost by 13%. Compared with mainstream detection algorithms such as Faster R-CNN, SSD, YOLOv5m, YOLOv8m and YOLOv10m, the proposed model demonstrates superior overall performance, validating its effectiveness and engineering applicability for unsafe behavior detection in complex coal mine environments.