Abstract:Surface electromyography directly reflects muscle activity, effectively capturing muscle contraction patterns and intensity, making it widely used in gesture recognition. However, it′s sparsity, non-linearity, and noise interference pose significant challenges for feature extraction. To address this, we propose the RASTNet model, using ResNet50 as the backbone and replacing the 3×3 convolution in each layer′s last block with an Atrous Spatial Pyramid Pooling module to capture multi-scale information via dilated convolutions. An STConv module, incorporating a triple attention mechanism into SCConv, is added to enhance the fusion of channel and spatial features. Experiments on the NinaPro DB1 and DB5 datasets, augmented with four methods, show that RASTNet improves accuracy by 1.83% and 1.57% on average. Compared with models like ResNeXt, Swin Transformer, and CnovNeXt under simulated noise, RASTNet outperforms in recall rate, F1 score, and other metrics. It also remains superior to the latest closed-source models without noise, demonstrating robustness and noise resistance in complex gesture recognition tasks. Additionally, RASTNet shows strong generalization across datasets, enhancing its real-world applicability and robustness.