Decoding strategies for dysarthric speech recognition
Author:
Affiliation:

1.School of Computer Science and Technology, Zhejiang Sci-Tech University,Hangzhou 310018,China; 2.College of Mechanical Engineering, Jiaxing University,Jiaxing 314001,China; 3.College of Information Science and Engineering, Zhejiang Sci-Tech University,Hangzhou 310018,China

Clc Number:

TN912.34; R741

  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Dysarthric speech arises from neurological disorders that cause motor impairments in the articulatory organs, resulting in abnormal pronunciation and prosody, which pose significant challenges to traditional ASR systems. To address these issues, this paper proposes an innovative algorithm that combines a multi-level representation fusion decoding strategy with hotword boosting technology. Built upon the Transformer-based encoder-decoder architecture, the approach improves the conventional single-view decoding method by introducing multilevel representation fusion. This is achieved through three distinct fusion strategies, which effectively enhance the model′s ability to comprehend complex sentences and contextual information. Additionally, to further improve the recognition accuracy of dysarthric speech, hotword boosting is integrated into the beam search decoding process to assign higher weights to key terms. The results demonstrate that the proposed method significantly reduces the WER compared to other baseline models. Specifically, compared to the Whisper baseline model, the WER on the UASpeech dataset decreased from 38.31% to 27.18%, and on the TORGO dataset, it decreased from 16.38% to 12.67%. This highlights the effectiveness of the proposed method in improving the accuracy of dysarthric speech recognition.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Online: May 23,2025
Article QR Code