Abstract:Existing fault diagnosis methods predominantly adopt a “single signalsingle model” dedicated architecture, requiring independent diagnostic models for different sensing signals. Such approaches face practical limitations including limited model generalization capability and insufficient adaptability across signal types. To address these issues, this paper proposes an intelligent diagnostic method based on a unified deep network model applicable to both vibration and acoustic signals. First, the method utilizes an improved gold rush optimizer algorithm and envelope entropy fitness function to optimize variational mode decomposition (VMD), enabling adaptive determination of the intrinsic mode function (IMF) decomposition number k and penalty factor α. Subsequently, the average kurtosis criterion is employed to screen VMD-decomposed IMF components, followed by secondary denoising and reconstruction using improved wavelet threshold denoising to enhance fault features in acoustic-vibration signals. Then, building upon the Transformer architecture, a deep residual shrinkage network is introduced to construct local feature extraction layers, thereby improving the model′s capability in local feature extraction. Concurrently, a multi-scale linear attention mechanism is designed to replace the multi-head self-attention in Transformer, reducing computational complexity while strengthening the model′s ability to capture long-range dependencies. Finally, experimental validation on a self-constructed rolling bearing acoustic-vibration dataset demonstrates the superiority of the proposed method, achieving 90% diagnostic accuracy for acoustic signals and 99.77% for vibration signals, outperforming comparative models including ResNet18, DRSN, ViT, MCSwin_T and WDCNN.