Multimodal 3D Object Detection Method Based on ConvNeXt and Deformable Cross Attention

Multimodal 3D Object Detection Method Based on ConvNeXt and Deformable Cross Attention
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:
Clc Number:TN958.98
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, with the rapid development of new energy vehicles, 3D object detection, as a core foundation of autonomous driving technology, has become increasingly important. Strategies that integrate multimodal information, such as radar point clouds and images, can significantly enhance the accuracy and robustness of object detection. Inspired by BEVDet, this paper proposes an improved multimodal fusion 3D object detection method based on the BEV (Bird"s Eye View) perspective. The method employs a ConvNeXt network combined with an FPN-DCN structure to efficiently extract image features and utilizes a deformable cross-attention mechanism to achieve deep fusion of image and point cloud data, thereby further enhancing the detection accuracy of the model. Experiments on the nuScenes autonomous driving dataset demonstrate the superior performance of our model, with an NDS of 64.9% on the test set, significantly outperforming most existing detection methods.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 02,2024
Revised:February 18,2025
Adopted:March 03,2025
Online:
Published:

Home

Introduction

Editorial Committee

Policy

Contact Us

中文版

Get Citation

Share

Article Metrics

History

Article QR Code