基于选择性融合上下文信息的立体匹配网络
DOI:
CSTR:
作者:
作者单位:

1.江苏科技大学计算机学院 镇江 212000;2.中国移动通信集团有限公司网络事业部 北京 100032

作者简介:

通讯作者:

中图分类号:

TP391;TN911

基金项目:

国家重点研发计划项目(2023YFC2809700)资助


Stereo matching network based on fusing contextual information selectively
Author:
Affiliation:

1.School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212000,China; 2.Department of Network China Mobile Communications Corporation,Beijing 100032,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目前基于深度学习的立体匹配网络虽然具备较高的精度,但是网络中复杂的结构导致计算时间的急剧增加。为了平衡网络的匹配速度与精确度,本文提出了基于选择性融合上下文信息的立体匹配网络。首先,通过相关层方法构建成本体,进而在聚合模块中采用单编码器解码器结构,以降低模型复杂度。其次,在编码器中融合多尺度成本体,以捕捉不同层级的视差信息;同时,在解码器中设计选择性融合上下文信息模块,利用参考图像的上下文特征引导几何信息的高质量解码。最后,设计多分支聚合金字塔池化模块,增强编码-解码模块理解全局语境的能力。实验结果表明,本文算法在KITTI2015数据集上全部区域的误匹配率为1.97%,在KITTI2012数据集上的三像素误差为1.50%。与其他算法相比,在满足算法实时性要求的同时,实现了更精准的立体匹配精度。

    Abstract:

    At present, although the stereo matching network based on deep learning has high accuracy, the complex model structure in the network leads to a sharp increase in computing time.In order to balance the matching speed and accuracy of the network, this paper proposes a stereo matching network based on fusing contextual information selectively. First, the cost volume is constructed through the correlation layer method, and then the single encoder decoder structure is used in the aggregation module to reduce the complexity of the model. Secondly, multi-scale cost bodies are fused in the encoder to capture different levels of parallax information; a selective context information fusion module is designed in the decoder, which uses the context features of the reference image to guide the generation of high-quality geometric information. Thirdly, multi-scale cost volume is fused in the encoder to capture different levels of parallax information; at the same time, fusing contextual information selectively module is designed in the decoder, which uses the context features of the reference image to guide the high-quality decoding of geometric information. Finally, the multi branch aggregation pyramid pooling module is designed to enhance the ability of the encoding-decoding module to understand the global context. The experimental results show that the mismatch rate of all regions on the KITTI2015 dataset is 1.97%, and the three pixel error on the KITTI2012 dataset is 1.50%. Compared with other algorithms, our algorithm achieves more accurate stereo matching accuracy while meeting the real-time requirements.

    参考文献
    相似文献
    引证文献
引用本文

宁安琪,於跃成,杨帆,李响.基于选择性融合上下文信息的立体匹配网络[J].电子测量技术,2025,48(15):141-149

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-09-29
  • 出版日期:
文章二维码

重要通知公告

①《电子测量技术》期刊收款账户变更公告