Abstract:To address the challenges of multiple peaks and power fluctuations in maximum power point tracking (MPPT) for photovoltaic systems under complex dynamic environments characterized by partial shading, rapid irradiance fluctuations, and temperature variations, a novel deep reinforcement learning-based algorithm, termed DDPG-LSTM, is proposed. The algorithm integrates the continuous action space optimization capability of the Deep Deterministic Policy Gradient and the temporal feature extraction advantage of Long Short-Term Memory networks. Hierarchical reward mechanisms are designed to achieve multi-objective collaborative optimization, balancing power tracking, action smoothness, and system stability. A simulation model of the photovoltaic system is built on the MATLAB/Simulink platform, and experimental results demonstrate that under multi-peak shading and dynamic environmental conditions, the DDPG-LSTM algorithm stably escapes local optima with negligible oscillations near the maximum power point, achieving an average tracking efficiency exceeding 98%. The robustness and adaptability of the proposed method in dynamic environments are validated, providing theoretical support for the intelligent control of photovoltaic systems and the efficient utilization of renewable energy.