Abstract:Intelligent recognition of rice leaf diseases is of great significance in modern agricultural production. Focused on the issue that the traditional Vision Transformer network lacks inductive bias and is difficult to effectively capture the local detail features of the image, an improved Vision Transformer model was proposed. This model′s the ability to model multi-scale context as well as local and global dependencies was enhanced by introducing intrinsic inductive bias, while reduced the need for large-scale datasets. In addition, the multi-layer perceptron module in the Vision Transformer was replaced by the Kolmogorov-Arnold networks structure, thereby improving the model′s ability to extract complex features and interpretability. Experimental results show that the proposed model achieved excellent performance in the rice leaf disease recognition task, with an accuracy of 98.62%, which was 6.2% higher than the original ViT model, effectively improving the recognition performance of rice leaf diseases.