Multimodal 3D object detection method based on ConvNeXt and deformable cross attention

Home > Archive>Volume 48, Issue 12, 2025 >63-70

Multimodal 3D object detection method based on ConvNeXt and deformable cross attention
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:1.School of Automation, Nanjing University of Information Science and Technology，Nanjing 210044, China; 2.School of Automation, Wuxi University，Wuxi 214105, China
Clc Number:TN958.98
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, with the rapid development of new energy vehicles, 3D object detection, as a core foundation of autonomous driving technology, has become increasingly important. Strategies that integrate multimodal information, such as radar point clouds and images, can significantly enhance the accuracy and robustness of object detection. Inspired by BEVDet, this paper proposes an improved multimodal fusion 3D object detection method based on the BEV (bird′s eye view) perspective. The method employs a ConvNeXt network combined with an FPN-DCN structure to efficiently extract image features and utilizes a deformable cross-attention mechanism to achieve deep fusion of image and point cloud data, thereby further enhancing the detection accuracy of the model. Experiments on the nuScenes autonomous driving dataset demonstrate the superior performance of our model, with an NDS of 64.9% on the test set, significantly outperforming most existing detection methods.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: July 28,2025
Published:

Home

Introduction

Editorial Committee

Policy

Contact Us

中文版

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code