基于Mamba自注意力和多模态融合的DVT风险预测
DOI:
CSTR:
作者:
作者单位:

1.广西大学计算机与电子信息学院 南宁 530004; 2.中国人民解放军陆军军医大学第一附属医院 重庆 400037

作者简介:

通讯作者:

中图分类号:

TN919.5;TP391.4

基金项目:

广西软科学研究计划(桂科AB17205002)、广西多源信息挖掘与安全重点实验室开放基金(MIMS20-06)、重庆市社会科学规划博士和培育项目(2025PY27)资助


DVT risk prediction via Mamba self-attention and multimodal fusion
Author:
Affiliation:

1.School of Computer and Electronic Information, Guangxi University,Nanning 530004, China; 2.The First Affiliated Hospital of Army Military Medical University,Chongqing 400037, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    深静脉血栓可能导致肺栓塞等严重并发症危及患者生命安全,早期进行DVT风险预测具有重要临床意义。针对当前DVT风险预测存在仅对单一结构化文本数据或图像数据进行预测,未有结合这两种模态数据进行预测的方法及深度学习最新模型应用于DVT风险预测方法较少这两个挑战问题。本研究将Mamba状态空间模型与多模态融合结合,首次提出一种基于Mamba自注意力机制和多模态融合的DVT风险预测方法。所提方法以患者超声影像和病史、实验室检验指标等结构化文本数据作为多模态输入数据,首先构建双通道特征编码框架,该框架利用ViT编码捕获医学超声图像特征,DNN编码获取结构化临床数据特征;然后设计基于Mamba自注意力和多模态特征融合框架,该框架首先拼接图像和结构化文本特征得到联合特征,采用原始Mamba训练联合特征得到多模态融合特征,然后设计Mamba自注意力、前馈网络和CNN实现多模态数据全局和局部、高层和底层特征提取和融合,实现多角度保留原始多模态特征;最后进行多层次MLP特征降维获得DVT预测结果。在临床数据集上与其他13种组合模型进行对比实验,结果表明该模型效果最佳,AUC达到0.912,较结构化数据单模态方法平均提高了11.97%,F1分数平均提高了13%。与传统图像数据单模态对比模型AUC平均提高14.7%,准确率和F1分数均提高20%以上。在多模态对比模型中,该模型与表现较优的ResNet与Transformer融合的模型(AUC=0.871)相比,其准确率、精确率、召回率、F1分数均约提高了6%,与同结构的Transformer混合模型相比,AUC与其余4项性能评估指标以及模型推理速度都提高20%以上。结果表明本研究中的模型为DVT的早期预防和预测提供了有力支持,具有良好的应用前景和临床价值。

    Abstract:

    Deep vein thrombosis can potentially give rise to severe complications like pulmonary embolism, which poses a threat to the life safety of patients. Therefore, early prediction of DVT risk holds significant clinical implications. However, current DVT risk prediction methods mainly focus on only predict using either single-text or single-image data, and there are few studies which integrate these two types of modal data for DVT risk prediction. To address these challenges, this study combines the Mamba state space model with multimodal fusion and proposes a novel DVT risk prediction method based on Mamba self-attention and multimodal fusion for the first time. This method takes the patient′s ultrasound images and structured text data such as medical history and laboratory test indicators as multimodal input data. Firstly, a dual-channel feature encoding framework is constructed, which uses ViT to capture the features of ultrasound images and DNN to obtain the features of structured clinical data. Then, this paper proposes a multimodal feature fusion framework based on Mamba self-attention. This framework first concatenates the image and text features to obtain the joint features, and then uses the original Mamba to train the joint features to obtain the multimodal fusion features. Subsequently, Mamba self-attention, feedforward network, and CNN are designed to extract and fuse global and local, high-level and low-level features of multimodal data, thereby preserving the original multimodal features from multiple perspectives. Finally, multi-level MLP is used for feature dimension reduction to obtain the DVT prediction results. Comparative experiments were conducted on a clinical dataset with 13 other combined models. The results show that this model outperforms the others, with an AUC of 0.912, an average improvement of 11.97% compared to the single structured data model, and an average improvement of 13% in F1 score. Compared with the traditional single image data model, the AUC is improved by an average of 14.7%, and the accuracy and F1 score are both increased by more than 20%. Among the multimodal comparison models, this model outperforms the ResNet and Transformer fusion model (AUC=0.871) in terms of accuracy, precision, recall, and F1 score by approximately 6%. Compared with the same-structured Transformer hybrid model, the AUC and the other four performance evaluation indicators as well as the model inference speed are all improved by more than 20%. The results indicate that the model proposed in this study provides strong support for the early prevention and prediction of DVT and has good application prospects and clinical value.

    参考文献
    相似文献
    引证文献
引用本文

肖连禹,陆向艳.基于Mamba自注意力和多模态融合的DVT风险预测[J].电子测量技术,2026,49(9):143-153

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-06-08
  • 出版日期:
文章二维码

重要通知公告

①《电子测量技术》期刊收款账户变更公告