Abstract:The intelligent harvesting of green citrus relies on fast and accurate detection technology. To address the issues of insufficient detection accuracy and missed detection caused by the diverse sizes of green citrus, complex orchard environments, and high similarity between fruits and backgrounds, this study proposes a lightweight and high-precision green citrus detection model (RT-GCTR). The model employs a large receptive field wavelet convolution module (WCLRF_Block) to enhance multi-scale target feature extraction. It integrates a multi-scale multi-head self-attention mechanism (MSMHSA) to construct a multi-scale fusion module (MSMH-AIFI) for adaptive feature aggregation. Additionally, it introduces SPDConv and CSP-OmniKernel modules to design the SCOK-CCFF feature pyramid, improving small target detection accuracy. Experimental results show that RT-GCTR achieves AP50 scores of 92.0% and 92.2% on training dataset 1 and test dataset 2, respectively, outperforming other advanced models. Compared to RT-DETR-r18, it reduces parameters and computations by 26.7% and 25.4%, respectively, and achieves a detection speed of 10.3 fps on the NVIDIA Jetson Orin NX. This study improves accuracy and real-time performance while reducing complexity, making it suitable for edge device applications.