Abstract:Aiming at the problem of low detection accuracy of pedestrians and cyclists in 3D object detection tasks, Voxel-RCNN is used as the baseline algorithm for improvement. A 3D object detection algorithm based on residual attention network and hybrid pooling is proposed to improve the detection accuracy. Firstly, a new 2D backbone network integrating residual network and attention mechanism is designed. The residual network structure is used to enhance the adaptability of the model to different object sizes. At the same time, the attention mechanism is introduced to focus on the key area and improve the feature representation ability. Secondly, a new MLP pooling method is proposed, and an attention pooling method combined with convolution is designed. The two pooling methods can not only effectively retain the local geometric details of small objects, but also enhance the expression ability of global semantic features, thereby further improving the ability to capture diverse objects in complex scenes. Experimental results on the public dataset KITTI show that the mean average precision (mAP3D) of the Pedestrian and Cyclist categories reached 54.06% and 76.85%, respectively, which is 3.43% and 3.03% higher than the baseline algorithm. The experimental results demonstrate the effectiveness of the proposed method.