基于深度强化学习的空间捕获自主决策
DOI:
CSTR:
作者:
作者单位:

哈尔滨理工大学自动化学院哈尔滨150080

作者简介:

通讯作者:

中图分类号:

TH166

基金项目:

国家自然科学基金(52102455)、黑龙江省自然科学基金(LH2023F032)项目资助


Autonomous decision-making for spatial capture based on deep reinforcement learning
Author:
Affiliation:

School of Automation, Harbin University of Science and Technology, Harbin 150080, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对航天器机械臂在复杂空间环境下执行旋转目标捕获任务时的自主决策问题,提出了一种改进的分布式深度确定性策略梯度的决策方法,以进一步增强捕获任务的自主决策能力,其中捕获航天器装备有三自由度的机械臂用于执行捕获操作,而目标航天器则处于固定位置并以恒定角速度进行旋转。为了提升空间捕获系统在复杂环境下的探索能力,设计了一种基于状态熵最大化的内部奖励探索机制:该机制通过计算当前状态与最小批量样本中各状态之间的欧氏距离,选取其中最小距离并通过熵计算将其转化为内部奖励,再与外部奖励进行线性叠加,构成最终的总奖励值,进而提升算法的收敛速度。与此同时,进一步构造了一种双网络结构:即通过两个价值网络分别对候选动作进行并行评估,并由两个策略网络选择价值更优的动作并输出执行,同时引入奖励重塑函数对奖励信号进行重塑,以降低算法估计的偏差,同时提高样本效率。最后,通过与多种主流强化学习算法进行仿真对比,验证了所提方法的有效性和优越性。具体实验结果表明:改进后的D4PG算法在奖励值方面提升了32.25%,在收敛速度方面提升了3.08%,显著提高了航天器机械臂执行空间捕获任务的自主决策能力。

    Abstract:

    To address the autonomous decision-making challenges of a spacecraft manipulator performing a rotating target capture task in a complex space environment, this article proposes an improved distributed deep deterministic policy gradient decision-making method to further enhance the autonomous decision-making capabilities of the capture task. The capture spacecraft is equipped with a three-degree-of-freedom manipulator for capture, while the target spacecraft is fixed and rotates at a constant angular velocity. To improve the exploration capability of the space capture system in complex environments, this article designs an internal reward exploration mechanism based on state entropy maximization. This mechanism calculates the Euclidean distance between the current state and each state in a minibatch, selects the minimum distance, and converts it into an internal reward through entropy calculation. This reward is then linearly superimposed with the external reward to form the final total reward, thereby improving the algorithm′s convergence speed. Furthermore, this article constructs a dual-network architecture. Two value networks evaluate candidate actions in parallel, and two policy networks select and execute the action with the best value. A reward reshaping function is introduced to reshape the reward signal to reduce estimation bias and improve sample efficiency. Finally, simulations and comparisons with several mainstream reinforcement learning algorithms, evaluate the effectiveness and superiority of the proposed method. Specific experimental results show that the improved D4PG algorithm has increased the reward value by 32.25% and the convergence speed by 3.08%, significantly improving the autonomous decision-making ability of the spacecraft robotic arm in performing space capture missions.

    参考文献
    相似文献
    引证文献
引用本文

黄成,殷振凯,邢爱佳,于智龙.基于深度强化学习的空间捕获自主决策[J].仪器仪表学报,2025,46(9):198-211

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-12-22
  • 出版日期:
文章二维码