A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

LIU Zhao; ZHOU Zhuangzhuang; ZHANG Mingyang; LIU Jingxian

doi:10.3963/j.jssn.1674-4861.2022.03.007

Volume 40 Issue 3

Jun. 2022

Turn off MathJax

Article Contents

Article Navigation > Journal of Transport Information and Safety > 2022 > 40(3): 60-74

LIU Zhao, ZHOU Zhuangzhuang, ZHANG Mingyang, LIU Jingxian. A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships[J]. Journal of Transport Information and Safety, 2022, 40(3): 60-74. doi: 10.3963/j.jssn.1674-4861.2022.03.007

Citation:

LIU Zhao, ZHOU Zhuangzhuang, ZHANG Mingyang, LIU Jingxian. A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships[J]. Journal of Transport Information and Safety, 2022, 40(3): 60-74. doi: 10.3963/j.jssn.1674-4861.2022.03.007

Citation:

PDF( 2620 KB)

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

doi: 10.3963/j.jssn.1674-4861.2022.03.007

LIU Zhao^{1, 2, 3
,},
ZHOU Zhuangzhuang^{1, 2},
ZHANG Mingyang⁴,
LIU Jingxian^{1, 2, 3
,
,}

1.
School of Navigation, Wuhan University of Technology, Wuhan 430063, China
2.
Hubei Key Laboratory of Inland Shipping Technology, Wuhan University of Technology, Wuhan 430063, China
3.
National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan 430063, China
4.
School of Engineering, Department of Mechanical Engineering, Aalto University, Espoo 20110, Finland

Received Date: 2022-02-16
Available Online: 2022-07-25

Abstract

Abstract

In order to meet the requirements of developingautonomous navigation of intelligent ships and solve the problems of low learning efficiency, weak generalization ability and poor robustness ofdecision-making methods for collision avoidance based on reinforcement learning, an autonomous collision avoidance method based on Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithmis proposed based on the high-dimensional characteristics of the information processed in the process of collision avoidanceand continuity nature of ship maneuvers, also considering the rationality and real-time progress of decision-making. The collision risk of a given ship is calculated by considering geographical location of the ship and the other ships nearby. The state space of intelligent collision avoidance model for autonomous ships is developed from the perspective of the global point of view. The continuous decision-making and action space of the ship is designed according to the maneuvering characteristics of encountered ships. An intelligent collision avoidance model is developed considering factors such as orientation of target ship, course keeping, collision risk, the COLREGs and good seamanship. Based on the TD3 algorithm, the ship autonomous collision avoidance network model is designed based on the state space structure, combining Long Short Term Memory(LSTM)networks and 1D-convolutional networks, and a network model is designed by using Actor-Critic structure.The network training is stabilized by means of clipped double q-learning, target strategy smoothing, and delayed policy updates.The developed collision avoidance model is trained and updated with random scenarios by usingframe skipping, dynamic increase of batch size, and iterative update times.In order to solve the problem of weak generalization ability of the model, a training process of random shipencounter scenariosbased on TD3 is proposed to achievemulti-scenario migration for theapplications of the model. A simulationis carried out to verify the model, then compared with the modified Artificial Potential Field(APF)method. The results show that the proposed method has high learning efficiency, fast and stable convergence. The trained model is applicable for the ships to passa safe distance in both two-ship and multi-ship encounter scenarios. In a complex encounter scenario, the success rate of collision avoidance reaches 99.233% when avoiding 2~4 target ships, 97.600% when 5~7 target ships, 94.166% when 8~10 target ships, is higher than that of the modified APF algorithm. The proposed method can effectively respond to the uncoordinated actions of incoming ships, with real-time performance, as well as safe and reasonable decision-making.The course change is fast, stable, and the vibration is small, also the path for avoiding collisions is smooth, which has better performance than the modified APF method.

FullText(HTML)

References(26)

References

[1]	张笛, 赵银祥, 崔一帆, 等. 智能船舶的研究现状可视化分析与发展趋势[J]. 交通信息与安全, 2021, 39(1): 7-16+34. https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS202101003.htm ZHANG D, ZHAO Y X, CUN Y F, et al. A visualization analysis and development trend of intelligent ship studies[J]. Journal of Transport Information and Safety, 2021, 39(1): 7-16+34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS202101003.htm
[2]	LYU H G, YIN Y. COLREGS-constrained real-time path planning for autonomous ships using modified artificial potential fields[J]. The Journal of Navigation, 2019, 72(3): 588-608. doi: 10.1017/S0373463318000796
[3]	黄立文, 李浩宇, 梁宇, 等. 基于操纵过程推演的船舶可变速自动避碰决策方法[J]. 交通信息与安全, 2021, 39(6): 1-10. doi: 10.3963/j.jssn.1674-4861.2021.06.001 HUANG L W, LI H Y, LIANG Y, et al. A decision-support system for automated collision avoidance of ships with variable speed based on simulation of maneuvering process[J]. Journal of Transport Information and Safety, 2021, 39(6): 1-10. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2021.06.001
[4]	丁志国, 张新宇, 王程博, 等. 基于驾驶实践的无人船智能避碰决策方法[J]. 中国舰船研究, 2021, 16(1): 96-104+113. https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202101011.htm DING Z G, ZHANG X Y, WANG C B, et al. Intelligent collision avoidance decision-making method for unmanned ships based on driving practice[J]. Chinese Journal of Ship Research, 2021, 16(1): 96-104+113. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202101011.htm
[5]	WANG T F, WU Q, ZHANG J F, et al. Autonomous decision-making scheme for multi-ship collision avoidance with iterative observation and inference[J]. Ocean Engineering, 2020(197): 106873.
[6]	刘冬冬, 史国友, 李伟峰, 等. 基于最短避碰距离和碰撞危险度的避碰决策支持[J]. 上海海事大学学报, 2018, 39(1): 13-18. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201801004.htm LIU D D, SHI G Y, LI W F, et al. Decision support of collision avoidance based on shortest avoidance distance and collision risk[J]. Journal of Shanghai Maritime University, 2018, 39(1): 13-18. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201801004.htm
[7]	KIM H, KIM S H, JEON M, et al. A study on path optimization method of an unmanned surface vehicle under environmental loads using genetic algorithm[J]. Ocean Engineering, 2017(142): 616-624.
[8]	ZHANG J F, ZHANG D, YAN X P, et al. A distributed anti-collision decision support formulation in multi-ship encounter situations under COLREGs[J]. Ocean Engineering, 2015 (105): 336-348.
[9]	朱凯歌, 史国友, 刘娇, 等. 基于船舶领域的让路船决策分析[J]. 上海海事大学学报, 2019, 40(3): 26-31. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201903006.htm ZHU K G, SHI G Y, LIU J, et al. Analysis on decision-making of give-way ships based on ship domain[J]. Journal of Shanghai Maritime University, 2019, 40(3): 26-31. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY201903006.htm
[10]	KANG Y T, CHEN W J, ZHU D Q, et al. Collision avoidance path planning in multi-ship encounter situations[J]. Journal of Marine Science and Technology, 2021, 26(4): 1026-1037. doi: 10.1007/s00773-021-00796-z
[11]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[12]	CHENG Y, ZHANG W D. Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels[J]. Neurocomputing, 2018(272): 63-73.
[13]	王程博, 张新宇, 邹志强, 等. 基于Q-Learning的无人驾驶船舶路径规划[J]. 船海工程, 2018, 47(5): 168-171. https://www.cnki.com.cn/Article/CJFDTOTAL-WHZC201805038.htm WANG C B, ZHANG X Y, ZOU Z Q, et al. On path planning of unmanned ship based on Q-Learning[J]. Ship & Ocean Engineering, 2018, 47(5): 168-171. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-WHZC201805038.htm
[14]	周怡, 袁传平, 谢海成, 等. 基于DDPG算法的游船航行避碰路径规划[J]. 中国舰船研究, 2021, 16(6): 19-26, 60. https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202106003.htm ZHOU Y, YUAN C P, XIE H C, et al. Collision avoidance path planning of tourist ship based on DDPG algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 19-26, 60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JCZG202106003.htm
[15]	ZHAO L M, ROH M I, LEE S J. Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J]. Journal of Marine Science and Technology, 2019, 27(4): 293-310.
[16]	ZHAO L, ROH M I. COLREGs-compliant multiship collision avoidance based on deep reinforcement learning[J]. Ocean Engineering, 2019(191): 106436.
[17]	XIE S, CHU X M, ZHENG M, et al. A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control[J]. Neurocomputing, 2020 (411): 375-392.
[18]	周双林, 杨星, 刘克中, 等. 规则约束下基于深度强化学习的船舶避碰方法[J]. 中国航海, 2020, 43(3): 27-32+46. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGHH202003005.htm ZHOU S L, YANG X, LIU K Z, et al. COLREGs-compliant method for ship collision avoidance based on deep reinforcement learning[J]. Navigation of China, 2020, 43(3): 27-32+46. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGHH202003005.htm
[19]	SHEN H Q, HASHIMOTO H, MATSUDA A, et al. Automatic collision avoidance of multiple ships based on deep Q-learning[J]. Applied Ocean Research, 2019(86): 268-288.
[20]	SAWADA R, SATO K, MAJIMA T. Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces[J]. Journal of Marine Science and Technology, 2021, 26(2): 509-524.
[21]	CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021(234): 109216.
[22]	VAGALE A, OUCHEIKH R, BYE R T, et al. Path planning and collision avoidance for autonomous surface vehicles I: A review[J]. Journal of Marine Science and Technology, 2021 (26): 1292-1306.
[23]	AKDAG M, SOLNOR P, JOHANSEN T A. Collaborative collision avoidance for maritime autonomous surface ships: A review[J]. Ocean Engineering, 2022(250): 110920.
[24]	SINGH B, KUMAR R, SINGH V P. Reinforcement learning in robotic applications: a comprehensive survey[J]. Artificial Intelligence Review, 2021(55): 945-990.
[25]	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[J]. Proceedings of Machine Learning Research, 2018(80): 1587-1596.
[26]	NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(22) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views (1428) PDF downloads(86)

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

doi: 10.3963/j.jssn.1674-4861.2022.03.007

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

doi: 10.3963/j.jssn.1674-4861.2022.03.007

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content