融合双通道卷积和改进型Conformer的两阶段语音增强算法
DOI:
CSTR:
作者:
作者单位:

1.桂林电子科技大学信息与通信学院 桂林 541004; 2.桂林电子科技大学认知无线电与信息处理 教育部重点实验室 桂林 541004; 3.南宁桂电电子科技研究院有限公司 南宁 530000

作者简介:

通讯作者:

中图分类号:

TN912.35

基金项目:

认知无线电与信息处理教育部重点实验室项目(CRKL230103)资助


Two-stage speech enhancement algorithm incorporating dual-channel convolution and improved Conformer
Author:
Affiliation:

1.School of Information and Communication, Guilin University of Electronic Technology,Guilin 541004, China; 2.Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education, Guilin University of Electronic Technology, Guilin 541004, China; 3.GUET-Nanning E-Tech Research Institute Co., Ltd.,Nanning 530000, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对语音关键特征提取不充分、模型结构单一的问题,提出一种两阶段下融合多尺度特征和改进型门控Conformer的语音增强方法。首先,针对关键特征提取不充分的问题,提出双通道卷积融合模块,采用不同感受野的二维卷积多尺度提取语音关键信息,并结合门控机制增强网络的短期与长期序列相关性,从而提升模型在复杂环境下的语音增强效果;提出改进型Conformer,采用时间注意和频率注意分别在时域和频域上进行建模,并结合膨胀卷积模块高效提取局部与全局上下文信息,从而增强网络在语音序列建模中的表现能力。其次,针对模型结构单一的问题,采用两阶段处理结构,将复杂问题分步处理。在第一阶段首先接收噪声频谱的幅值,初步估计出干净语音的幅值,并与噪声相位进行重构,得到粗糙的复频谱。第二阶段在第一阶段得到粗谱的基础上进一步提取更精细的特征,增强语音信号的细节表现能力。最后,在VoiceBank+DEMAND数据集上进行测试,实验结果表明,所提算法相比带噪语音的语音感知质量和短时客观可懂度分别提升50.25%、3.26%,表明该网络能够更有效地提高语音的可懂度,同时改善语音信号的整体质量,具有较强的降噪能力。

    Abstract:

    In order to solve the problems of insufficient extraction of key speech features and single model structure, a double-stage speech enhancement method incorporating multi-scale features and improved gated Conformer was proposed to solve the problems of insufficient extraction of key features of speech and single model structure. Firstly, in order to solve the problem of insufficient extraction of key features, a two-channel convolutional fusion module was proposed, which used two-dimensional convolutional multi-scale extraction of speech key information with different receptive fields, and combined with the gating mechanism to enhance the short-term and long-term sequence correlation of the network, so as to improve the speech enhancement effect of the model in complex environments. An improved Conformer is proposed, which uses time attention and frequency attention to model in the time and frequency domains respectively, and combines the dilated convolution module to efficiently extract local and global context information, so as to enhance the performance ability of the network in speech sequence modeling. Secondly, for the problem with a single model structure, a two-stage processing structure is adopted to deal with the complex problem step by step. In the first stage, the amplitude of the noise spectrum is received, the amplitude of the clean speech is preliminarily estimated, and the noise phase is reconstructed to obtain the rough complex spectrum. In the second stage, on the basis of the rough spectrum obtained in the first stage, more refined features were further extracted to enhance the detailed expression ability of the speech signal. Finally, the experimental results are carried out on the VoiceBank+DEMAND dataset, and the experimental results show that the objective evaluation index and short-term intelligibility of this model are increased by 50.25% and 3.26%, respectively, compared with the noisy voice, indicating that the proposed algorithm can improve the intelligibility of speech more effectively, and at the same time improve the overall quality of speech signals, and has strong noise reduction ability.

    参考文献
    相似文献
    引证文献
引用本文

徐佳瑜,郑展恒,曾庆宁,王健.融合双通道卷积和改进型Conformer的两阶段语音增强算法[J].电子测量技术,2025,48(4):149-157

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-04-10
  • 出版日期:
文章二维码