留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于REINFORCE算法和神经网络的无人驾驶车辆变道控制

闫浩 刘小珠 石英

闫浩, 刘小珠, 石英. 基于REINFORCE算法和神经网络的无人驾驶车辆变道控制[J]. 交通信息与安全, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019
引用本文: 闫浩, 刘小珠, 石英. 基于REINFORCE算法和神经网络的无人驾驶车辆变道控制[J]. 交通信息与安全, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019
YAN Hao, LIU Xiaozhu, SHI Ying. Lane-change Control for Unmanned Vehicle Based on REINFORCE Algorithm and Neural Network[J]. Journal of Transport Information and Safety, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019
Citation: YAN Hao, LIU Xiaozhu, SHI Ying. Lane-change Control for Unmanned Vehicle Based on REINFORCE Algorithm and Neural Network[J]. Journal of Transport Information and Safety, 2021, 39(1): 164-172. doi: 10.3963/j.jssn.1674-4861.2021.01.0019

基于REINFORCE算法和神经网络的无人驾驶车辆变道控制

doi: 10.3963/j.jssn.1674-4861.2021.01.0019
基金项目: 

国家自然科学基金项目 51805388

湖北省技术创新重大项目 2019AAA025

详细信息
    作者简介:

    闫浩(1996—),硕士研究生.研究方向:强化学习、无人驾驶.E-mail:675462459@qq.com

    通讯作者:

    石英(1975—),博士,教授.研究方向:深度学习、无人驾驶、大数据.E-mail:a_laly@163.com

  • 中图分类号: U461.1

Lane-change Control for Unmanned Vehicle Based on REINFORCE Algorithm and Neural Network

  • 摘要: 针对无人驾驶车辆变道超车场景,研究基于REINFORCE算法和神经网络技术的无人驾驶车辆变道控制策略。通过车辆动力学模型确定模型的反馈量、控制量和输出限幅要求; 设计神经网络控制器的结构,根据REINFORCE算法设计控制器训练方案; 分析经验池数据数值和方差过大的问题,提出1种经验池数据预处理的方法以改进控制器训练方案; 结合无人驾驶车辆运行场景,分析和研究强化学习过程中产生的奖励分布稀疏问题,并针对该问题提出1种基于对数函数的奖励塑造解决方案; 与PID控制器和LQR控制器进行对比实验验证。实验结果表明,与PID相比,该控制策略有更小的最大误差,变道过程更安全; 与LQR相比,该控制策略性能表现接近,以此证明其用于无人驾驶车辆变道控制任务的可行性。此外,记录在不同平台下该控制策略的执行时间以证明其实时性和在轻量级平台运行的可行性。

     

  • 图  1  车辆单轨模型图

    Figure  1.  Monorail model of vehicle

    图  2  车辆变道控制系统结构图

    Figure  2.  Structure of vehicle lane-change control system

    图  3  强化学习过程示意图

    Figure  3.  Process of reinforcement learning

    图  4  “0-1”设置例1

    Figure  4.  "0-1"setting in case 1

    图  5  “0-1”设置例2

    Figure  5.  "0-1"setting in case 2

    图  6  车速为10 m/s时对照PID实验结果图

    Figure  6.  Experimental result compared to PID when the vehicle speed is 10 m/s

    图  7  车速为15 m/s时对照PID实验结果图

    Figure  7.  Experimental result compared to PID when the vehicle speed is 15 m/s

    图  8  车速为20 m/s时对照PID实验结果图

    Figure  8.  Experimental result compared to PID when the vehicle speed is 20 m/s

    图  9  车速为25 m/s时对照PID实验结果

    Figure  9.  Experimental result compared to PID when the vehicle speed is 25 m/s

    图  10  车速为10 m/s时对照LQR实验结果图

    Figure  10.  Experimental result compared to LQR when the vehicle speed is 10 m/s

    图  11  车速为15 m/s时对照LQR实验结果图

    Figure  11.  Experimental result compared to LQR when the vehicle speed is 15 m/s

    图  12  车速为20 m/s时对照LQR实验结果图

    Figure  12.  Experimental result compared to LQR when the vehicle speed is 20 m/s

    图  13  车速为25 m/s时对照LQR实验结果图

    Figure  13.  Experimental result compared to LQR when the vehicle speed is 25 m/s

    表  1  车辆固定参数表

    Table  1.   Fixed parameters of vehicle

    固定参数 数值
    sf 0.2
    sr 0.2
    a 1.232
    b 1.468
    Ccf 66 900
    Ccr 62 700
    Clf 66 900
    Clr 62 700
    m 1 723
    Iz 4 175
    下载: 导出CSV

    表  2  神经网络参数表

    Table  2.   Parameters of the neural network

    第1层 第2层
    输入维度 5 200
    输出维度 200 51
    激活函数 tanh
    下载: 导出CSV

    表  3  变道完成后误差和变道过程中最大误差记录表

    Table  3.   Errors after lane change and the maximum error during lane change

    车速和控制器 变道完成后误差/m 变道过程中最大误差/m
    10 m/s,REINFORCE 0.02 0.06
    10 m/s,PID 0 0.17
    10 m/s,LQR 0 0.02
    15 m/s,REINFORCE 0.04 0.07
    15 m/s,PID 0 0.17
    15 m/s,LQR 0 0.05
    20 m/s,REINFORCE 0.06 0.07
    20 m/s,PID 0 0.17
    20 m/s,LQR 0 0.12
    25 m/s,REINFORCE 0.08 0.10
    25 m/s,PID 0 0.17
    25 m/s,LQR 0 0.19
    下载: 导出CSV

    表  4  神经网络控制器运行时间记录表

    Table  4.   Running time of the neural-network controller

    平台 仿真总用时/s 仿真总步数 单步平均用时/s
    计算机 2.834 99 1 202 0.002 36
    TX2 3.898 25 1 202 0.003 24
    Jetson nano 4.859 62 1 202 0.004 04
    下载: 导出CSV
  • [1] AHN S, CASSIDY M J. Freeway traffic oscillations and vehicle lane change Maneuvers[C]. 17th International Sympo-sium on Transportation & Traffic Theory, London: Elsevier, 2007.
    [2] 邱少林, 钱立军, 陆建辉. 基于最优预瞄的智能车变道控制[J]. 中国机械工程, 2019, 30(23): 2778-2783. doi: 10.3969/j.issn.1004-132X.2019.23.002

    QIU Shaolin, QIAN Lijun, LU Jianhui. Lane-change control for intelligent vehicles based on optimal preview[J]. China Mechanical Engineering, 2019, 30(23): 2778-2783. (in Chinese) doi: 10.3969/j.issn.1004-132X.2019.23.002
    [3] 林小宁, 顾筠, 沈峘. 车辆自主快速变道的轨迹规划与跟踪控制[J]. 兰州理工大学学报, 2017, 43(6): 108-112. doi: 10.3969/j.issn.1673-5196.2017.06.021

    LIN Xiaoning, GU Jun, SHEN Huan. Trajectory planning and follow up controling of vehicle autonomous fast lane change[J]. Journal of Lanzhou University of Technology, 2017, 43(6): 108-112. (in Chinese) doi: 10.3969/j.issn.1673-5196.2017.06.021
    [4] PENG Tao, SU Lili, ZHANG Ronghui. A new safe lane-change trajectory model and collision avoidance control method for automatic driving vehicles[J]. Expert Systems with Applications, 2019, 141: 112953. http://www.sciencedirect.com/science/article/pii/S0957417419306712
    [5] HU Jianjun, XIONG Songsong, ZHA Junlin, FU Chunyun. Lane detection and trajectory tracking control of autonomous vehicle based on model predictive control[J]. International Journal of Automotive Technology, 2020, 20(2): 285-295. doi: 10.1007/s12239-020-0027-6
    [6] WU Xiaodong, QIAO Bangjun, SU Chengrui. Trajectory planning with time-variant safety margin for autonomous vehicle lane change[J]. Applied Sciences-Basel, 2020, 10(5): 16-26. http://www.researchgate.net/publication/339622745_Trajectory_Planning_with_Time-Variant_Safety_Margin_for_Autonomous_Vehicle_Lane_Change
    [7] 聂枝根, 王万琼, 赵伟强, 等. 基于轨迹预瞄的智能汽车变道动态轨迹规划与跟踪控制[J]. 交通运输工程学报, 2020, 20(2): 147-160. https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC202002012.htm

    NIE Zhigen, WANG Wanqiong, ZHAO Weiqiang, et al. Dynamic trajectory planning and tracking control for lane change of intelligent vehicle based on trajectory preview[J]. Journal of Traffic and Transportation Engineering, 2020, 20(2): 147-160. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC202002012.htm
    [8] 蔡英凤, 秦顺琪, 臧勇, 等. 基于可拓优度评价的智能汽车横向轨迹跟踪控制方法[J]. 汽车工程, 2019, 41(10): 1189-1196. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC201910012.htm

    CAI Yingfeng, QIN Shunqi, ZHANG Yong, et al. Lateral trajectory tracking control scheme for intelligent vehicle based on extension goodness evaluation[J]. Automotive Engineering, 2019, 41(10): 1189-1196. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC201910012.htm
    [9] 白成盼, 惠飞, 景首才. 基于微分平坦与MPC的智能车换道控制算法[J]. 计算机技术与发展, 2020, 30(5): 16-20. doi: 10.3969/j.issn.1673-629X.2020.05.004

    BAI Chengpan, GU Fei, JING Shoucai. Intelligent car lane change control algorithm based on differential flatness and MPC[J]. Computer Technology and Development, 2020, 30(5): 16-20. (in Chinese) doi: 10.3969/j.issn.1673-629X.2020.05.004
    [10] 刘洋. 智能车辆高速公路自动变道轨迹规划与控制研究[D]. 长春: 吉林大学, 2019.

    LIU Yang. Research on the trajectory planning and control for automatic lane change of intelligent vehicles on highway[D]. Changchun: Jilin University, 2019. (in Chinese)
    [11] 张家旭, 施正堂, 赵健, 等. 基于Radau伪谱法的汽车高速紧急换道避障最优控制策略设计[J]. 汽车工程, 2020, 42 (8): 1040-1049. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202008008.htm

    ZHANG Jiaxu, SHI Zhengtang, ZHAO Jian, et al. Optimal control strategy design for vehicle high-speed emergency lane change collision avoidance based on Radau pseudospectral method[J]. Auto-motive Engineering, 2020, 42(8): 1040-1049. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202008008.htm
    [12] 任彧, 赵师涛. 磁导航AGV深度强化学习路径跟踪控制方法[J]. 杭州电子科技大学学报(自然科学版), 2019, 39(2): 28-34. https://www.cnki.com.cn/Article/CJFDTOTAL-HXDY201902006.htm

    REN Yu, ZHAO Shitao. Deep reinforcement learning based path following control of magnetic navigation AGV[J]. Journal of Hangzhou Dianzi University(Natural Sciences), 2019, 39(2): 28-34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-HXDY201902006.htm
    [13] 赵师涛. 基于强化学习的磁导航AGV控制方法研究[D]. 杭州: 杭州电子科技大学, 2018.

    ZHAO Shitao. Research on Reinforcement Learning based control method of magnetic navigation AGV[D]. Hangzhou: Hangzhou Dianzi University, 2018. (in Chinese)
    [14] ANDREAS B, ANASTASIOS M. Straightpath following for underactuated Marine vessels using deep Reinforcement Learning[J]. IFAC-Papers OnLine, 2018, 51(29): 329-334. doi: 10.1016/j.ifacol.2018.09.502
    [15] WANG Shuti, YING Xunhe, LI Peng, et al. Trajectory tracking control for mobile robots using reinforcement learning and PID[J]. Iranian Journal of Science and Technology Transations of Electrcal Engineering, 2020, 44(2): 1031-1041. doi: 10.1007/s40998-020-00311-x
    [16] PACEJKA H B. Tyre and vehicle dynamics[M]. 2nd Ed. Burlington: butter-worth-heinemann, 2006.
    [17] 龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制[M]. 北京: 北京理工大学出版社, 2014.

    GONG Jianwei, JIANG Yan, Xu Wei. Model predictive control for self-driving vehicles[M]. Beijing: Beijing Institute of Techno- logy Press. (in Chinese)
    [18] 理查德·萨顿, 安德鲁·巴图. 强化学习[M]. 2版. 北京: 电子工业出版社, 2019.

    RICHARD S. Sutton, ANDREW G. Barto. Reinforcement Learning: an introducetion[M]. 2ed. Beijing: Electronic Industry Press, 2019. (in Chinese)
    [19] 中华人民共和国住房和城乡建设部. 城市快速路设计规程: CJJ 129—2009[S]. 北京: 中国建筑工业出版社, 2009.

    Ministry of Housing and Urban-Rural Development of the People's Republic of China. Specification for design of urban expressway: CJJ 129—2009[S]. Beijing: China Architecture & Building Press, 2009. (in Chinese)
  • 加载中
图(13) / 表(4)
计量
  • 文章访问数:  947
  • HTML全文浏览量:  563
  • PDF下载量:  38
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-25
  • 刊出日期:  2021-02-28

目录

    /

    返回文章
    返回