留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于双延迟深度确定性策略梯度的船舶自主避碰方法

刘钊 周壮壮 张明阳 刘敬贤

刘钊, 周壮壮, 张明阳, 刘敬贤. 基于双延迟深度确定性策略梯度的船舶自主避碰方法[J]. 交通信息与安全, 2022, 40(3): 60-74. doi: 10.3963/j.jssn.1674-4861.2022.03.007
引用本文: 刘钊, 周壮壮, 张明阳, 刘敬贤. 基于双延迟深度确定性策略梯度的船舶自主避碰方法[J]. 交通信息与安全, 2022, 40(3): 60-74. doi: 10.3963/j.jssn.1674-4861.2022.03.007
LIU Zhao, ZHOU Zhuangzhuang, ZHANG Mingyang, LIU Jingxian. A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships[J]. Journal of Transport Information and Safety, 2022, 40(3): 60-74. doi: 10.3963/j.jssn.1674-4861.2022.03.007
Citation: LIU Zhao, ZHOU Zhuangzhuang, ZHANG Mingyang, LIU Jingxian. A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships[J]. Journal of Transport Information and Safety, 2022, 40(3): 60-74. doi: 10.3963/j.jssn.1674-4861.2022.03.007

基于双延迟深度确定性策略梯度的船舶自主避碰方法

doi: 10.3963/j.jssn.1674-4861.2022.03.007
基金项目: 

国家自然科学基金项目 52171351

详细信息
    作者简介:

    刘钊(1986—),博士,副教授. 研究方向:群船智慧挖掘与应用、船舶智能组织与调度、船舶风险计算与自主航行. E-mail:zhaoliu@whut.edu.cn

    通讯作者:

    刘敬贤(1967—),博士,教授. 研究方向:交通环境与安全保障. E-mail:ljxteacher@sohu.com

  • 中图分类号: U675.96

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

  • 摘要: 为满足智能船舶自主航行的发展需求,解决基于强化学习的船舶避碰决策方法存在的学习效率低、泛化能力弱以及复杂会遇场景下鲁棒性差等问题,针对船舶避碰决策信息的高维性和动作的连续性等特点,考虑决策的合理性和实时性,研究了基于双延迟深度确定性策略梯度(TD3)的船舶自主避碰方法。根据船舶间相对运动信息与碰撞危险信息,从全局角度构建具有连续多时刻目标船信息的状态空间;依据船舶操纵性设计连续决策动作空间;综合考虑目标导向、航向保持、碰撞危险、《1972年国际海上避碰规则》(COLREGs)和良好船艺等因素,设计船舶运动的奖励函数;基于TD3算法,根据状态空间结构,结合长短期记忆(LSTM)网络和一维卷积网络,利用Actor-Critic结构设计船舶自主避碰网络模型,利用双价值网络学习、目标策略平滑以及策略网络延迟更新等方式稳定网络训练,利用跳帧以及批量大小和迭代更新次数动态增大等方式加速网络训练;为解决模型泛化能力弱的问题,提出基于TD3的船舶随机会遇场景训练流程,实现自主避碰模型应用的多场景迁移。运用训练得到的船舶自主避碰模型进行仿真验证,并与改进人工势场(APF)算法进行比较,结果表明:所提方法学习效率高,收敛快速平稳;训练得到的自主避碰模型在2船和多船会遇场景下均能使船舶在安全距离上驶过,并且在复杂会遇场景中比改进APF算法避碰成功率高,避让2~4艘目标船时成功率高达99.233%,5~7艘目标船时成功率97.600%,8~10艘目标船时成功率94.166%;所提方法能有效应对来船的不协调行动,避碰实时性高,决策安全合理,航向变化快速平稳、震荡少、避碰路径光滑,比改进APF方法性能更强。

     

  • 图  1  船舶自主避碰框架

    Figure  1.  Ship autonomous collision avoidance framework

    图  2  强化学习基本原理图

    Figure  2.  Reinforcement learning fundamentals diagram

    图  3  碰撞危险判断示意图

    Figure  3.  Collision risk judgment diagram

    图  4  船舶会遇避碰策略

    Figure  4.  Collision avoidance strategy under the situation of ship encounter

    图  5  船舶自主避碰策略网络结构

    Figure  5.  Actor network structure of ship autonomous collision avoidance

    图  6  船舶自主避碰算法训练流程

    Figure  6.  Training process of ship autonomous collision avoidance algorithm

    图  7  累积奖励变化曲线

    Figure  7.  Total reward curve

    图  8  追越局面船舶轨迹图

    Figure  8.  Ship trajectory diagram of overtaking situation

    图  9  追越局面本船航向变化曲线

    Figure  9.  Course change curve of own ship in overtaking situation

    图  10  追越局面船舶间距离变化曲线

    Figure  10.  Curve of distance between ships in overtaking situation

    图  11  追越局面左转场景仿真结果

    Figure  11.  Simulation results of left turn scenario in overtaking situation

    图  12  对遇局面船舶轨迹图

    Figure  12.  Ship trajectory diagram of head-on situation

    图  13  对遇局面本船航向变化曲线

    Figure  13.  Course change curve of own ship in head-on situation

    图  14  对遇局面目标船航向变化曲线

    Figure  14.  Course change curve of target ship in head-on situation

    图  15  对遇局面船舶间距离变化曲线

    Figure  15.  Curve of distance between ships in head-on situation

    图  16  交叉相遇局面船舶轨迹图

    Figure  16.  Ship trajectory diagram of crossing situation

    图  17  交叉相遇局面本船航向变化曲线

    Figure  17.  Course change curve of own ship in crossing situation

    图  18  交叉相遇局面船舶间距离变化曲线

    Figure  18.  Curve of distance between ships in crossing situation

    图  19  多船会遇场景船舶轨迹图

    Figure  19.  Ship trajectory diagram of multi-ships encounter scenario

    图  20  多船会遇场景本船航向变化曲线

    Figure  20.  Course change curve of own ship in multi-ships encounter scenario

    图  21  TD3避碰算法下本船与目标船距离变化曲线

    Figure  21.  Curve of distance between own ship and target ships under TD3 collision avoidance algorithm

    图  22  APF避碰算法下本船与目标船距离变化曲线

    Figure  22.  Curve of distance between own ship and target ships APF collision avoidance algorithm

    表  1  实验环境信息

    Table  1.   Experimental environment conditions

    硬件环境 处理器(CPU)
    显卡(GPU)
    内存
    AMD Ryzen 9 5900X
    NVIDIAGeForce RTX 3080Ti 12G
    G.Skill 32G/3600Mhz
    软件环境 操作系统 Windows10(64位)
    编程语言 Python 3.9.7
    深度学习框架 TensorFlow 2.6.2
    强化学习环境 OpenAI Gym 0.19.0
    下载: 导出CSV

    表  2  船舶避碰算法训练参数

    Table  2.   Training parameters of ship collision avoidance algorithm

    参数 数值 参数 数值
    迭代次数 1.5x106 折扣因子 0.94
    经验池容量 1x106 探索噪声方差 0.5
    网络学习率 0.000 3 平滑噪声方差 1
    批量大小 128~256 延迟更新频率 4
    迭代更新次数 1~2 软更新率 0.005
    下载: 导出CSV

    表  3  多船会遇场景目标船初始设置

    Table  3.   Initial setting of target ship in multi-ship encounter scenario

    船舶 初始位置 航向/(°) 航速/(n mile/h)
    目标船(TS1) (7, 0) 315 12
    目标船(TS2) (7.5, 7) 225 12
    目标船3(TS3) (6, 8) 180 5
    目标船4(TS4) (4, 8) 120 4
    下载: 导出CSV

    表  4  对比实验结果数据表

    Table  4.   Comparison test results data table

    实验方法 目标船数量/艘 成功率/% 平均路径长度/n mile
    TD3
    避碰算法
    2~4 99.233 15.823
    5~7 97.600 17.951
    8~10 94.166 19.894
    改进APF
    避碰算法
    2~4 97.700 14.978
    5~7 93.766 16.163
    8~10 89.033 17.394
    下载: 导出CSV
  • [1] 张笛, 赵银祥, 崔一帆, 等. 智能船舶的研究现状可视化分析与发展趋势[J]. 交通信息与安全, 2021, 39(1): 7-16+34. https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS202101003.htm

    ZHANG D, ZHAO Y X, CUN Y F, et al. A visualization analysis and development trend of intelligent ship studies[J]. Journal of Transport Information and Safety, 2021, 39(1): 7-16+34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS202101003.htm
    [2] LYU H G, YIN Y. COLREGS-constrained real-time path planning for autonomous ships using modified artificial potential fields[J]. The Journal of Navigation, 2019, 72(3): 588-608. doi: 10.1017/S0373463318000796
    [3] 黄立文, 李浩宇, 梁宇, 等. 基于操纵过程推演的船舶可变速自动避碰决策方法[J]. 交通信息与安全, 2021, 39(6): 1-10. doi: 10.3963/j.jssn.1674-4861.2021.06.001

    HUANG L W, LI H Y, LIANG Y, et al. A decision-support system for automated collision avoidance of ships with variable speed based on simulation of maneuvering process[J]. Journal of Transport Information and Safety, 2021, 39(6): 1-10. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2021.06.001
    [4] 丁志国, 张新宇, 王程博, 等. 基于驾驶实践的无人船智能避碰决策方法[J]. 中国舰船研究, 2021, 16(1): 96-104+113. https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202101011.htm

    DING Z G, ZHANG X Y, WANG C B, et al. Intelligent collision avoidance decision-making method for unmanned ships based on driving practice[J]. Chinese Journal of Ship Research, 2021, 16(1): 96-104+113. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202101011.htm
    [5] WANG T F, WU Q, ZHANG J F, et al. Autonomous decision-making scheme for multi-ship collision avoidance with iterative observation and inference[J]. Ocean Engineering, 2020(197): 106873.
    [6] 刘冬冬, 史国友, 李伟峰, 等. 基于最短避碰距离和碰撞危险度的避碰决策支持[J]. 上海海事大学学报, 2018, 39(1): 13-18. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201801004.htm

    LIU D D, SHI G Y, LI W F, et al. Decision support of collision avoidance based on shortest avoidance distance and collision risk[J]. Journal of Shanghai Maritime University, 2018, 39(1): 13-18. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201801004.htm
    [7] KIM H, KIM S H, JEON M, et al. A study on path optimization method of an unmanned surface vehicle under environmental loads using genetic algorithm[J]. Ocean Engineering, 2017(142): 616-624.
    [8] ZHANG J F, ZHANG D, YAN X P, et al. A distributed anti-collision decision support formulation in multi-ship encounter situations under COLREGs[J]. Ocean Engineering, 2015 (105): 336-348.
    [9] 朱凯歌, 史国友, 刘娇, 等. 基于船舶领域的让路船决策分析[J]. 上海海事大学学报, 2019, 40(3): 26-31. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201903006.htm

    ZHU K G, SHI G Y, LIU J, et al. Analysis on decision-making of give-way ships based on ship domain[J]. Journal of Shanghai Maritime University, 2019, 40(3): 26-31. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201903006.htm
    [10] KANG Y T, CHEN W J, ZHU D Q, et al. Collision avoidance path planning in multi-ship encounter situations[J]. Journal of Marine Science and Technology, 2021, 26(4): 1026-1037. doi: 10.1007/s00773-021-00796-z
    [11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
    [12] CHENG Y, ZHANG W D. Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels[J]. Neurocomputing, 2018(272): 63-73.
    [13] 王程博, 张新宇, 邹志强, 等. 基于Q-Learning的无人驾驶船舶路径规划[J]. 船海工程, 2018, 47(5): 168-171. https://www.cnki.com.cn/Article/CJFDTOTAL-WHZC201805038.htm

    WANG C B, ZHANG X Y, ZOU Z Q, et al. On path planning of unmanned ship based on Q-Learning[J]. Ship & Ocean Engineering, 2018, 47(5): 168-171. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-WHZC201805038.htm
    [14] 周怡, 袁传平, 谢海成, 等. 基于DDPG算法的游船航行避碰路径规划[J]. 中国舰船研究, 2021, 16(6): 19-26, 60. https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202106003.htm

    ZHOU Y, YUAN C P, XIE H C, et al. Collision avoidance path planning of tourist ship based on DDPG algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 19-26, 60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202106003.htm
    [15] ZHAO L M, ROH M I, LEE S J. Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J]. Journal of Marine Science and Technology, 2019, 27(4): 293-310.
    [16] ZHAO L, ROH M I. COLREGs-compliant multiship collision avoidance based on deep reinforcement learning[J]. Ocean Engineering, 2019(191): 106436.
    [17] XIE S, CHU X M, ZHENG M, et al. A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control[J]. Neurocomputing, 2020 (411): 375-392.
    [18] 周双林, 杨星, 刘克中, 等. 规则约束下基于深度强化学习的船舶避碰方法[J]. 中国航海, 2020, 43(3): 27-32+46. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGHH202003005.htm

    ZHOU S L, YANG X, LIU K Z, et al. COLREGs-compliant method for ship collision avoidance based on deep reinforcement learning[J]. Navigation of China, 2020, 43(3): 27-32+46. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGHH202003005.htm
    [19] SHEN H Q, HASHIMOTO H, MATSUDA A, et al. Automatic collision avoidance of multiple ships based on deep Q-learning[J]. Applied Ocean Research, 2019(86): 268-288.
    [20] SAWADA R, SATO K, MAJIMA T. Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces[J]. Journal of Marine Science and Technology, 2021, 26(2): 509-524.
    [21] CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021(234): 109216.
    [22] VAGALE A, OUCHEIKH R, BYE R T, et al. Path planning and collision avoidance for autonomous surface vehicles I: A review[J]. Journal of Marine Science and Technology, 2021 (26): 1292-1306.
    [23] AKDAG M, SOLNOR P, JOHANSEN T A. Collaborative collision avoidance for maritime autonomous surface ships: A review[J]. Ocean Engineering, 2022(250): 110920.
    [24] SINGH B, KUMAR R, SINGH V P. Reinforcement learning in robotic applications: a comprehensive survey[J]. Artificial Intelligence Review, 2021(55): 945-990.
    [25] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. Proceedings of Machine Learning Research, 2018(80): 1587-1596.
    [26] NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
  • 加载中
图(22) / 表(4)
计量
  • 文章访问数:  1428
  • HTML全文浏览量:  601
  • PDF下载量:  86
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-02-16
  • 网络出版日期:  2022-07-25

目录

    /

    返回文章
    返回