Abstract:Manual interpretation of dynamometer cards for beam pumping units suffers from low efficiency and difficulty in quantitatively characterizing the evolution of operational conditions. Although traditional data-driven models have achieved high recognition accuracy, they inherently face challenges such as weak physical interpretability and difficulties in edge deployment. To address these issues, this paper proposes a physics-informed lightweight spatiotemporal visual perception model for efficient, interpretable, and deployable intelligent recognition of liquid supply capacity in low-yield oil wells. First, the time-series signals of suspension point displacement and load are converted into normalized grayscale images. Combined with wave equation simulation and iterative intelligent annotation, a dedicated dynamometer card image dataset, Low-YieldD, is constructed for low-yield oil wells. On this basis, a cascaded "temporal forecasting–spatial recognition" architecture is designed. A dual-stream coupled LSTM network models the dynamic evolution trends of dynamometer card sequences to achieve accurate prediction of future operating conditions. Innovatively, a physics-informed spatial attention (PISA) mechanism is proposed. This mechanism encodes the "fluid pound delay" mechanism into a differentiable Gaussian spatial mask, guiding the lightweight convolutional neural network to focus on critical regions such as the unloading zone, thereby endowing the visual feature extraction process with explicit physical interpretability. Experimental results show that the proposed model has only about 1.3% of the parameters of standard AlexNet, yet achieves 99.1% accuracy in liquid supply capacity image recognition, with a physical plausibility score of 0.93, significantly outperforming mainstream lightweight networks such as MobileNetV2. Industrial deployment validation demonstrates that system response delay is reduced by 87.5%, monthly ineffective pumping is decreased by 59.7%, and electricity consumption per ton of liquid is lowered by 22.0%, achieving substantial improvements in both energy efficiency and operational maintenance efficiency. This study provides a feasible pathway for industrial equipment visual inspection that integrates physical mechanisms with lightweight deep learning, offering high accuracy, strong interpretability, and edge deployment adaptability.