Abstract:Multi-modal vehicle trajectory prediction, as a bridge between perception and decision planning, plays an important role in autonomous driving systems. Aiming at the problems of insufficient feature fusion and difficulty in balancing the prediction accuracy and efficiency of existing methods, a vehicle multimodal trajectory prediction model based on Hierarchical Feature Fusion and End-point Induction (HFF-EI) is proposed. Firstly, One-dimensional residual convolution and a feature pyramid network (FPN) are used to encode the vehicle historical trajectory information, thereby fully extracting the relevant features. Then a hierarchical feature fusion structure is constructed, and local feature fusion is carried out for the vehicle and the map, followed by global feature fusion, achieving efficient and comprehensive fusion of scene features across all elements. Secondly, a multi-layer perceptron (MLP) based on the dynamic weight model is introduced for trajectory endpoint prediction, enhancing the adaptive ability of the model under different traffic scenes. Finally, an endpoint refinement module based on endpoint information interaction is proposed, which uses the attention mechanism to interact trajectory information in a longer spatial and temporal ranges. Ablation and comparative experiments were conducted on the public dataset Argoverse1. Results of the ablation experiments show that the three modules of the HFF-EI model effectively improve the performance of trajectory prediction, and reduce the minimum average displacement error, minimum final displacement error, loss rate and minimum final displacement error with penalty by 8.87%, 13.52%, 31.07%, and 8.93%, respectively. On the test set, the minimum final displacement error is 1.134 m, the minimum final displacement error with penalty term is 1.773 m and the inference time is 10.22 ms, which proves the effectiveness of the proposed model by its comprehensive performance advantages compared with the 10 benchmark models.