基于全局体素特征交互的3D目标检测算法
DOI:
CSTR:
作者:
作者单位:

重庆邮电大学自动化学院重庆400065

作者简介:

通讯作者:

中图分类号:

TP391.4TH865

基金项目:

国家重点研发计划项目(2022YFE0101000)、重庆市技术创新与应用发展专项重大项目(CSTB2023TIAD-STX0035)、重庆市教育委员会科学技术研究项目(KJQN202200630)资助


Global voxel feature interaction-based 3D object detection
Author:
Affiliation:

School of Automation, Chongqing University of Posts and Telecommunications,Chongqing 400065, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对当前多数基于激光雷达的3D目标检测方法中因局部感受野限制无法建模特征远距离依赖,以及对点云数据的窗口划分策略导致的拓扑结构破坏等问题,提出了一种基于全局体素特征交互的3D点云目标检测网络。首先,设计基于希尔伯特空间曲线和Mamba的长距离上下文特征提取模块,通过对体素空间进行希尔伯特曲线序列化并保持体素间的空间局部性,利用Mamba处理长序列的优势提取具有长距离依赖的点云上下文特征,显著提升算法对长程依赖的建模能力。其次,设计基于特征图响应强度的自适应体素扩散模块,进行体素之间大规模的长程特征交互,通过动态生成扩散体素对目标中心体素的语义表达能力进行增强。此外,提出了一种空间特征恢复算子,通过子流形卷积的局部结构保持能力和Mamba的全局建模特性,对局部和全局特征表达进一步进行协同优化,用于补充序列化和体素聚合过程引入的信息损失。在KITTI数据集进行了实验,结果表明,方法达到了先进的3D目标检测性能,在汽车、行人和骑行者这3种类别的中等检测难度下精度分别达到了82.36%、61.96%、66.05%,同时推理速度达到19 fps,相比于基准模型,该方法较好地保持了精度和效率间的平衡。同时,在实际道路场景中进行了可视化对比分析,该方法表现出较强的泛化能力和实际应用潜力。

    Abstract:

    To address the inability to model the long-distance dependence of features due to the limitation of local receptive fields, and the destruction of topological structure caused by the window division strategy for point cloud data in most 3D object detection, this article proposes a global voxel feature interaction-based 3D object detection method. First, a long-range context feature extraction module based on the Hilbert space-filling curves and Mamba is designed. It employs Hilbert curve ordering to serialize the voxel space while preserving spatial locality among voxels, and leverages the capability of Mamba in processing long sequences to capture point cloud context features with long-range dependencies, significantly enhancing the ability to model global contextual relationships. Secondly, an adaptive voxel diffusion module based on feature map intensity is introduced, which facilitates large-scale long-range feature interactions between voxels by dynamically generating diffused voxels to enhance the semantic representation capacity of target center voxels. Furthermore, a spatial feature recovery operator is proposed to compensate for information loss during serialization and aggregation, leveraging the local structure preservation of submanifold convolution and the global modeling capability of Mamba to further synergistically optimize both local and global feature representations. Experiments on the KITTI dataset show that the method achieves state-of-the-art performance, with 82.36%, 61.96%, and 66.05% accuracy on the car, pedestrian, and cyclist classes at moderate difficulty, while maintaining a high inference speed of 19 frames per second (FPS). The proposed method represents a superior balance between accuracy and efficiency. In addition, by comparing our method with others in real road scenes intuitively. It demonstrates that the proposed method has strong generalization ability and practical application potential.

    参考文献
    相似文献
    引证文献
引用本文

刘明杰,魏宇,陈俊生,刘平,朴昌浩.基于全局体素特征交互的3D目标检测算法[J].仪器仪表学报,2025,46(9):146-158

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-12-22
  • 出版日期:
文章二维码