Abstract:Dysarthric speech arises from neurological disorders that cause motor impairments in the articulatory organs, resulting in abnormal pronunciation and prosody, which pose significant challenges to traditional ASR systems. To address these issues, this paper proposes an innovative algorithm that combines a multi-level representation fusion decoding strategy with hotword boosting technology. Built upon the Transformer-based encoder-decoder architecture, the approach improves the conventional single-view decoding method by introducing multi-level representation fusion. This is achieved through three distinct fusion strategies, which effectively enhance the model's ability to comprehend complex sentences and contextual information. Additionally, to further improve the recognition accuracy of dysarthric speech, hotword boosting is integrated into the beam search decoding process to assign higher weights to key terms. The results demonstrate that the proposed method significantly reduces the WER compared to other baseline models. Specifically, compared to the Whisper baseline model, the WER on the UASpeech dataset decreased from 38.31% to 27.18%, and on the TORGO dataset, it decreased from 16.38% to 12.67%. This highlights the effectiveness of the proposed method in improving the accuracy of dysarthric speech recognition.