Abstract:The unmanned aerial vehicle (UAV)-assisted wireless power supply for the Internet of Things (IoT) is an innovative network architecture where UAVs serve as energy transmission intermediaries, effectively addressing the limitations and constraints of power supply for IoT devices. In addressing the challenge of multi-objective control policy learning in UAV-assisted wireless power supply for the IoT, this study proposes a Multi-Objective Twin-Delay Deep Deterministic Policy Gradient (MOTD3) algorithm based on deep reinforcement learning. The MOTD3 algorithm aims to achieve joint optimization of multiple objectives, including maximizing the total data rate and total harvested energy, while minimizing energy consumption and hover time, under constraints such as yaw angle, flight speed, and transmission power. Additionally, it adapts UAVs to dynamic demand changes through online path planning. Simulation results demonstrate that the proposed algorithm outperforms the Deep Deterministic Policy Gradient (DDPG) algorithm, the Advantage Actor-Critic algorithm (A2C) and other control strategies in terms of multi-objective optimization performance, convergence, and stability. Moreover, it exhibits strong generalization capabilities, making it suitable for various communication scenarios in practical applications.