Abstract:Aiming at the recognition problem caused by the complex background and the high similarity of commodity packaging in the vending machine scene, a commodity recognition method combining multi-scale attention mechanism and metric learning is proposed. Firstly, based on the ResNet hierarchical structure, multi-head self-attention is introduced to fully exploit the advantages of multi-scale feature extraction of convolutional neural network (CNN) and the global modeling ability of Transformer, and a new multi-scale hollow attention is designed to make the model focus on local features such as trademark shape and local texture in similar packaging, as well as context global features. Secondly, a down-sampling multi-scale feature fusion strategy is designed to effectively improve the multi-scale feature expression ability of the algorithm. Finally, ArcFace loss function is used to enhance the recognition ability of the model. In order to verify the effectiveness of the proposed method, a commodity data set in a real scene is constructed, which is collected by the top-view camera of the vending cabinet. The experimental results show that the MAP @ 1 accuracy of this method on the Commodity 553 dataset reaches 87.4%, which is better than the current mainstream recognition methods and can achieve more accurate commodity recognition.