Abstract:To address the issues of degraded feature discriminability, difficulty in recognizing small-scale targets, and occlusion of key body parts caused by high-dynamic human motion in cluttered scenes, an improved fall detection algorithm ICI-YOLO based on YOLOv10 is proposed. The contextual attention aggregation replaces the partial self-attention, achieving global contextual dependency and fine-grained spatial fusion representation. The iterative attentional feature fusion mechanism is incorporated to restructure the C2f of backbone, strengthening semantic representation capabilities for critical regions. An interactive feature fusion network integrating interactive convolution block and cross-scale convolutional feature fusion module is proposed, to improve multi-scale feature fusion capability. Experimental results demonstrate that the enhanced ICI-YOLO model achieves performance gains of 4.3% in recall and 2.2% in mAP@0.5 on the self-constructed human fall behavior detection dataset FALL, while attaining improvements of 2.0% in precision and 1.5% in mAP@0.5:0.95 on the public dataset DiverseFALL10500. Compared with mainstream real-time detection algorithms, the proposed method exhibits superior detection performance.