Abstract:To address the autonomous decision-making challenges of a spacecraft manipulator performing a rotating target capture task in a complex space environment, this article proposes an improved distributed deep deterministic policy gradient decision-making method to further enhance the autonomous decision-making capabilities of the capture task. The capture spacecraft is equipped with a three-degree-of-freedom manipulator for capture, while the target spacecraft is fixed and rotates at a constant angular velocity. To improve the exploration capability of the space capture system in complex environments, this article designs an internal reward exploration mechanism based on state entropy maximization. This mechanism calculates the Euclidean distance between the current state and each state in a minibatch, selects the minimum distance, and converts it into an internal reward through entropy calculation. This reward is then linearly superimposed with the external reward to form the final total reward, thereby improving the algorithm′s convergence speed. Furthermore, this article constructs a dual-network architecture. Two value networks evaluate candidate actions in parallel, and two policy networks select and execute the action with the best value. A reward reshaping function is introduced to reshape the reward signal to reduce estimation bias and improve sample efficiency. Finally, simulations and comparisons with several mainstream reinforcement learning algorithms, evaluate the effectiveness and superiority of the proposed method. Specific experimental results show that the improved D4PG algorithm has increased the reward value by 32.25% and the convergence speed by 3.08%, significantly improving the autonomous decision-making ability of the spacecraft robotic arm in performing space capture missions.