基于跨模态协同感知的双流融合动作识别模型
DOI:
CSTR:
作者:
作者单位:

1.无锡学院集成电路科学与工程学院 无锡 214105; 2.南京信息工程大学电子与信息工程学院 南京 210044; 3.无锡学院江苏省集成电路可靠性技术及检测系统工程研究中心 无锡 214105)

作者简介:

通讯作者:

中图分类号:

TP391.41;TN914

基金项目:

国家自然科学基金(62204172)项目资助


Dual-stream fusion action recognition model based on cross-modal co-sensing
Author:
Affiliation:

1.School of Integrated Circuit Science and Engineering, Wuxi University,Wuxi 214105, China; 2.School of Electronic and Information Engineering, Nanjing University of Information Science and Technology,Nanjing 210044, China; 3.Jiangsu Province Engineering Research Center of Integrated Circuit Reliability Technology and Testing System,Wuxi 214105, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有动作识别算法中时空特征融合不充分及丰富的骨架信息未能得到充分利用等问题,本文提出一种基于跨模态协同感知的双流融合动作识别模型。首先,本文提出一种双流融合模型,通过融合RGB视频流和骨架流,获取两个模块的全局信息,实现优势互补;然后,提出时空交互注意力模块,实现了时空特征的深度协同与动态互补,动态增强相关时空区域的注意力权重;最后,设计出一种多模态特征融合模块,将通过RGB视频流和骨架流的输出进行特征融合增强,通过自适应权重分配与跨模态交互,充分挖掘RGB视觉外观与人体骨骼运动间的互补信息,从而提升动作识别准确率。多组实验结果表明,该双流融合动作识别模型在NTU RGB+D和NTU RGB+D 120数据集上实现了高精度的动作识别,分别获得97.2%和92.3%的准确率,与基线方法MMTM相比,精度分别提高了3.6%和3.2%。通过结果表明,该模型可以充分提取利用人体骨架信息,同时充分融合时空特征,提高对动作识别的准确率。

    Abstract:

    Aiming at the problems of insufficient spatio-temporal feature fusion and failure to fully utilize the rich skeleton information in the existing action recognition algorithms, this paper proposes a dual-stream fusion action recognition model based on cross-modal synergetic perception. Firstly, this paper proposes a dual-stream fusion model, which obtains the global information of the two modules by fusing the RGB video stream and the skeleton stream, realizing the complementary advantages; proposes a spatio-temporal interaction and attention enhancement module, which realizes the in-depth synergistic and dynamic complementarity of spatio-temporal features and dynamically enhances the attention weight of the relevant spatio-temporal region; and finally, designs a Multimodal Feature Fusion Module.Feature Fusion Module, which will be enhanced by feature fusion through the outputs of RGB video streams and skeleton streams, and fully exploits the complementary information between RGB visual appearance and human skeleton motion through adaptive weight assignment and cross-modal interaction, so as to improve the accuracy of action recognition. The results of multiple sets of experiments show that this CC-DFARM achieves high accuracy on the NTU RGB+D and NTU RGB+D 120 datasets of action recognition, obtaining 97.2% and 92.3% accuracies, respectively, and improving the accuracy by 3.6% and 3.2% compared to the baseline method MMTM. The results show that the model can fully extract and utilize the human skeleton information, and at the same time fully integrate the spatio-temporal features to improve the accuracy of action recognition.

    参考文献
    相似文献
    引证文献
引用本文

刘罡,李小雨,吴烨,郑泽林.基于跨模态协同感知的双流融合动作识别模型[J].电子测量技术,2025,48(21):87-97

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-12-25
  • 出版日期:
文章二维码