Abstract:To address the challenges of multiscale features and complex background processing in plant pest and disease detection in modern agriculture, this paper proposes an efficient and accurate detection model, AgriSwin, to improve the precision and efficiency of agricultural pest and disease detection. The AgriSwin model is based on the Swin Transformer and integrates a dilated feature aggregation module and an adaptive spatial convolution module. The dilated feature aggregation module extracts multi-scale features through convolutional layers with different dilation rates and optimizes feature fusion using an adaptive weighting mechanism for global feature information. The adaptive spatial convolution module generates adaptive weights to dynamically weight the feature maps, enhancing the ability to capture both local and global information in complex backgrounds. Experimental results show that the AgriSwin model achieves detection accuracies of 79.65%、99.90%、and 95.08% on the PlantDoc, PlantVillage, and custom datasets, respectively. Additionally, the model′s parameter count is reduced by 25.63% compared to Swin Transformer-T, significantly lowering memory and computational resource requirements while maintaining high accuracy, demonstrating its broad potential for large-scale agricultural applications.