Real-time Forecast Models for Traffic Accidents on Expressways Using Stability Coefficients of Traffic Flow
-
摘要: 预测交通事故实时风险时,存在大量指标变量,导致数据难以采集,不仅不利于构建预测模型,且带来的过拟合问题会降低模型预测可靠性。为了减少预测指标数量,提升预测模型可用性,降低预测模型过拟合影响,构建具有可解释性的2种交通流稳定性系数以简化指标集,分别为纵向交通流稳定系数和横向交通流稳定系数。采集西安市G3001高速公路交通事故与交通流历史数据,选用支持向量机、随机森林、Logistic回归模型,分别构建高速公路交通事故实时风险预测模型。通过改进的GI指数评估交通流稳定性系数的显著性,以检验其有效性;通过指标集在训练与测试数据中的预测精度、AUC值差异评估交通流稳定性系数对降低预测模型过拟合的作用,并通过训练耗时评估模型的计算效率,以检验新方法的可靠性。研究结果表明:2种交通流稳定性系数对应的改进GI指数分别为0.952和0.922,显著大于其他受试指标,与交通事故实时风险显著相关。在3种预测模型中,包含2种交通流稳定性系数的简化指标集在训练和测试数据中的预测精度分别为91.1%和90.5%,与完整指标集相近。2种指标集在训练与测试数据中的平均预测精度差异分别为0.69%和4.87%;平均AUC值差异分别为1.61%和5.87%;平均训练时间下降了15.2%。交通流稳定性系数大幅提高了预测模型的可靠性,同时显著提升了模型的计算效率。Abstract: Real-time forecast models for traffic accidents requires a large number of variables, which causes difficulties in data collection and decreases reliability of the model due to overfitting. Two interpretable variables, vertical and horizontal stability coefficients of traffic flow, are proposed to simplify the set of variables, which can facilitate the implementation of forecast models for traffic accidents and reduce the effects of overfitting. Three algorithms including support vector machine, random forest, and logistic regression are selected to develop real-time forecast models for traffic accidents on expressways, respectively. The experiments are conducted based on data of traffic accidents and historical traffic flow collected from the expressway G3001 in the city of Xi'an. In addition, the improved GI index is used to evaluate the significance of the proposed two stability coefficients of traffic flow. The effects of the two proposed coefficients on reducing overfitting is verified through comparing accuracies and AUC values of the set of variables in the test and training data.Besides, the computational efficiency is evaluated by the training time to verify the reliability of the developed models with the two coefficients. The results show that the improved GI indices of the models with horizontal and vertical stability coefficients of traffic flow are 0.952 and 0.922, respectively, which indicates that the proposed two coefficients are more significant for forecasting accidents on expressways than other variables. In the three models, the simplified set of variables based on the two coefficientshas prediction accuracy of 91.1% and 90.5%, respectively, in training and test data, which is similar to the original set of variables. The differences of average prediction accuracy between the simplified set of variables and the original set of variables are 0.69% and 4.87%, respectively. The difference of average AUC values between the two sets of variables are 1.61% and 5.87%, respectively. The average time cost of model training with the simplified set of variables decreases by 15.2%. Thus, the two proposed stability coefficients of traffic flow can improve both the reliability and the computational efficiency of the models.
-
表 1 高速公路交通事故实时风险预测指标集
Table 1. Predictor set of expressway traffic accidents real-time risk forecast
主要类别 次级类别 基础指标 基础指标代码 交通状态 交通量(VC) 上游平均交通量/(veh/h) VCup 上游交通量标准差/(veh/h) Std. VCup 上游相邻车道间平均交通量差值/(veh/h) Dif. VCup 下游平均交通量/(veh/h) VCdo 下游交通量标准差/(veh/h) Std. VCdo 下游相邻车道间平均交通量差值/(veh/h) Dif. VCdo 上下游平均交通量差值/(veh/h) Dif. VCup - do 占有率(OCC) 上游平均占有率/% OCCup 上游占有率标准差/% Std. OCCup 上游相邻车道间平均占用率差值/% Dif. OCCup 下游平均占有率/% OCCdo 下游占有率标准差/% Std. OCCdo 下游相邻车道间平均占用率差值/% Dif. OCCdo 上下游平均占有率差值/% Dif. OCCup - do 速度(S) 上游平均速度/(km/h) Sup 上游速度标准差/(km/h) Std. Sup 上游相邻车道间平均速度差值/(km/h) Dif. Sup 下游平均速度/(km/h) Sdo 下游速度标准差/(km/h) Std. Sdo 下游相邻车道间平均速度差值/(km/h) Dif. Sdo 上下游平均速度差值/(km/h) Dif. Sup - do 道路几何线形 主线 路段长度/m SL 车道数/条 NL 路面宽度/m RSW 车道宽度/m LW 内侧路肩宽度/m ISW 外侧路肩宽度/m OSW 分隔带宽度/ m MW 匝道 合流区占路段总长的比例/% MA 分流区占路段总长的比例/% DA 分合流区间距/m DMR 环境 天气情况 WC 时间 TD 咼峰时间段 PP 限速(km/h) VL 表 2 基于交通流稳定性系数的高速公路交通事故实时风险预测简化指标集
Table 2. Traffic flow stability coefficients based simplified predictor set of expressway traffic accidents real-time prediction
基础指标 指标代码 交通流纵向稳定性系数 Dif.DEup - do 交通流横向稳定性系数 Dif. DEdo 重车混人率/% PT 合流区占路段总长的比例/% MA 分流区占路段总长的比例/% DA 天气情况 WC 表 3 各路段中受试指标的改进GI指数
Table 3. Improved Gini index of tested predictors in each road section
改进GI指数指标代码 路段编码 1-2# 2-3# 3-4# 4-5# 5-6# 6-7# 7-8# 8-9# 9-10# 10-11# 11-12# 12-13# 13-14# 14-1# 均值 标准差 Dif.DEup-do 0.968 0.934 0.967 0.948 0.976 0.978 0.947 0.941 0.956 0.934 0.945 0.970 0.938 0.932 0.952 0.016 Dif. DEdo 0.932 0.887 0.956 0.939 0.919 0.931 0.883 0.935 0.938 0.921 0.885 0.957 0.895 0.930 0.922 0.025 Sdo 0.886 0.838 0.907 0.890 0.894 0.845 0.877 0.867 0.905 0.899 0.872 0.846 0.834 0.887 0.875 0.025 OCCdo 0.791 0.756 0.805 0.820 0.825 0.770 0.761 0.781 0.775 0.821 0.791 0.798 0.823 0.786 0.793 0.023 Sup 0.724 0.678 0.749 0.708 0.752 0.688 0.683 0.672 0.703 0.685 0.680 0.770 0.761 0.690 0.710 0.034 DA 0.685 0.638 0.741 0.640 0.642 0.737 0.683 0.668 0.725 0.688 0.679 0.640 0.723 0.730 0.687 0.039 MA 0.667 0.625 0.621 0.642 0.691 0.638 0.652 0.668 0.651 0.653 0.683 0.633 0.680 0.674 0.655 0.022 WC 0.653 0.614 0.640 0.655 0.652 0.667 0.643 0.629 0.655 0.635 0.639 0.626 0.661 0.625 0.643 0.016 PT 0.619 0.585 0.602 0.588 0.631 0.589 0.622 0.584 0.635 0.598 0.643 0.611 0.635 0.599 0.610 0.021 VCdo 0.598 0.565 0.563 0.620 0.615 0.571 0.580 0.579 0.591 0.612 0.588 0.597 0.613 0.593 0.592 0.019 OCCup 0.553 0.523 0.558 0.549 0.578 0.543 0.536 0.549 0.571 0.547 0.564 0.561 0.555 0.557 0.553 0.014 SL 0.514 0.455 0.502 0.518 0.509 0.530 0.497 0.519 0.492 0.480 0.474 0.537 0.504 0.510 0.503 0.022 VCup 0.419 0.383 0.381 0.429 0.418 0.435 0.391 0.417 0.456 0.390 0.402 0.413 0.443 0.428 0.415 0.023 表 4 各测试组训练耗时情况
Table 4. Train time in each tested group
路段编号 SVM RF LR 简化 完整 简化 完整 简化 完整 1-2# 2.45 3.05 2.35 2.83 2.27 2.58 2-3# 2.51 3.03 2.40 2.85 2.25 2.62 3-4# 2.62 3.05 2.37 2.91 2.25 2.60 4-5# 2.70 3.15 2.44 2.97 2.35 2.71 5-6# 2.68 3.16 2.46 2.97 2.40 2.69 6-7# 2.73 3.15 2.46 2.99 2.39 2.70 7-8# 2.73 3.15 2.44 2.97 2.39 2.70 8-9# 2.70 3.19 2.48 3.05 2.37 2.68 9-10# 2.67 3.19 2.44 3.01 2.41 2.71 10-11# 2.73 3.21 2.43 2.97 2.35 2.75 11-12# 2.65 3.07 2.38 2.92 2.17 2.58 12-13# 2.63 3.05 2.38 2.93 2.20 2.60 13-14# 2.63 3.05 2.40 2.94 2.25 2.55 14-1# 2.64 3.08 2.41 2.94 2.29 2.59 平均值训练耗时/s 2.65 3.11 2.42 2.95 2.31 2.65 耗时差异/% 14.79 17.97 12.83 -
[1] MITCHELL T M. Machine learning: A guide to current research[M]. Boston: Springer, 2011. [2] 游锦明, 方守恩, 张兰芳, 等. 高速公路实时事故风险研判模型及可移植性[J]. 同济大学学报(自然科学版), 2019, 47(3): 347-352. https://www.cnki.com.cn/Article/CJFDTOTAL-TJDZ201903007.htmYOU J M, FANG S E, ZHANG L F, et al. Real-time crash prediction models and transferability analysis on freeways[J]. Journal of Tongji University (Natural Science), 2019, 47 (3): 347-352. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TJDZ201903007.htm [3] CHEN F, CHEN S, MA X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data[J]. Journal of Safety Research, 2018, 65: 153-159. doi: 10.1016/j.jsr.2018.02.010 [4] 曾强, 苏绮琪, 郑嘉仪, 等. 基于贝叶斯时空建模的高速公路事故黑点判别[J]. 交通信息与安全, 2020, 38(6): 87-94. doi: 10.3963/j.jssn.1674-4861.2020.06.012ZENG Q, SU Q Q, ZHENG J Y, et al. Identification of freeway crash hotspots based on bayesian spacetime modeling[J]. Journal of Transport Information and Safety, 2020, 38(6): 87-94. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2020.06.012 [5] XU C, LIU P, WAND W, et al. Evaluation of the impacts of traffic states on crash risks on freeways[J]. Accident Analysis & Prevention, 2012, 47(1): 162-171. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=73569892&site=ehost-live [6] WANG J, ZHENG Y, LI X, et al. Driving risk assessment using near-crash database through data mining of tree-based model[J]. Accident Analysis & Prevention, 2015, 84: 54-64. http://www.sciencedirect.com/science?_ob=ShoppingCartURL&_method=add&_eid=1-s2.0-S0001457515300129&originContentFamily=serial&_origin=article&_ts=1492997092&md5=3c80968da0a43c3d2404eab6f16f51ec [7] OH C, PARK S, RITCHIE S G. A method for identifying rear-end collision risks using inductive loop detectors[J]. Accident Analysis & Prevention, 2006, 38(2): 295-301. http://www.onacademic.com/detail/journal_1000034577385310_f6a9.html [8] THOEFILATOS A, CHEN C, ANTONIOU C. Comparing machine learning and deep learning methods for real-time crash prediction[J]. Transportation Research Record, 2019, 2673(8): 169-178. doi: 10.1177/0361198119841571 [9] 赵海涛, 程慧玲, 丁仪, 等. 基于深度学习的车联边缘网络交通事故风险预测算法研究[J]. 电子与信息学报, 2020, 42(1): 50-57. https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX202001006.htmZHAO H T, CHENG H L, DING Y, et al. Research on traffic accident risk prediction algorithm based on deep learning in car link edge network[J]. Acta Electronica Sinica, 2020, 42(1): 50-57. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX202001006.htm [10] HOSSAIN M, ABDEL-ATY M, QUDDUS M A, et al. Real-time crash prediction models: State-of-the-art, design pathways and ubiquitous requirements[J]. Accident Analysis & Prevention, 2019(124): 66-84. http://www.xueshufan.com/publication/2910624182 [11] SUN J, SUN J. A dynamic Bayesian network model for real-time crash prediction using traffic speed conditions data[J]. Transportation Research Part C: Emerging Technologies, 2015(54): 176-186. http://www.researchgate.net/profile/Jie_Sun50/publication/274405799_A_dynamic_Bayesian_network_model_for_real-time_crash_prediction_using_traffic_speed_conditions_data/links/566666e808ae418a786f445e.pdf [12] YASMIN S, ELURU N, WANG L, et al. A joint framework for static and real-time crash risk analysis[J]. Analytic Methods in Accident Research, 2018, 18: 45-56. doi: 10.1016/j.amar.2018.04.001 [13] PANDE A, ABDEL-ATY M. Comprehensive analysis of the relationship between real-time traffic surveillance data and rear-end crashes on freeways[J]. Transportation Research Record, 2006, 1953(1): 31-40. doi: 10.1177/0361198106195300104 [14] XU C C, WANG W, LIU P. A genetic programming model for real-time crash prediction on freeways[J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(2): 574-586. doi: 10.1109/TITS.2012.2226240 [15] AHMED M M, ABDEL-ATY M. The viability of using automatic vehicle identification data for real-time crash prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2012, 13(2): 459-468. doi: 10.1109/TITS.2011.2171052 [16] WANG L, ABDEL-ATY M, SHI Q, et al. Real-time crash prediction for expressway weaving segments[J]. Transportation Research Part C: Emerging Technologies, 2015(61): 1-10. http://www.researchgate.net/profile/Ling_Wang43/publication/284068999_Real-time_crash_prediction_for_expressway_weaving_segments/links/5733520d08ae9f741b2610e1.pdf [17] 程国柱, 刚杰, 程瑞, 等. 公路货运通道路侧事故多发路段判别与线形设计[J]. 哈尔滨工业大学学报, 2022, 54 (3): 131-138. https://www.cnki.com.cn/Article/CJFDTOTAL-HEBX202203015.htmCHEN G Z, GANG J, CHEN R, et al. Identification of roadside accident blackspot and geometric design of dedicated freight corridor on highways[J]. Journal of Harbin Institute of Technology, 2022, 54(3): 131-138. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-HEBX202203015.htm [18] 高珍, 高屹, 余荣杰, 等. 连续数据环境下的道路交通事故风险预测模型[J]. 中国公路学报, 2018, 31(4): 280-287. doi: 10.3969/j.issn.1001-7372.2018.04.032GAO Z, GAO Y, YU R J, et al. Road crash risk prediction model for continuous streaming data environment[J]. China Journal of Highway and Transport, 2018, 31(4): 280-287. (in Chinese) doi: 10.3969/j.issn.1001-7372.2018.04.032 [19] 沈静. 高速公路事故风险实时预测及事后时空影响分析[D]. 南京: 东南大学, 2017.SHEN J. Real-time risk prediction and spatiotemporal impact analysis for freeway accident[D]. Nanjing: Southeast University, 2017. (in Chinese) [20] 姜正申, 刘宏志, 付彬, 等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用[J]. 计算机学报, 2019, 42(1): 1-15. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201901001.htmJIANG Z S, LIU H Z, FU B, et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization[J]. Chinese Journal of Computers, 2019, 42(1): 1-15. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201901001.htm [21] 王文宪, 况瑢, 郭经纬, 等. 铁路专用线危险货物运输安全指标属性约简研究[J]. 中国安全生产科学技术, 2017, 13 (11): 59-65. https://www.cnki.com.cn/Article/CJFDTOTAL-LDBK201711011.htmWANG W X, KUANG R, GUO J W, et al. Research on attribute reduction for safety indexes of dangerous goods transportation in special railway[J]. Journal of Safety Science and Technology, 2017, 13(11): 59-65. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-LDBK201711011.htm [22] 石宁宁. 驾驶员事故频次分布及其影响因素分析[D]. 北京: 北京交通大学, 2018.SHI N N. Drivers' accident frequency distribution and its influencing factors[D]. Beijing: Beijing Jiaotong University, 2018. (in Chinese) [23] THEOFILATOS A. Incorporating real-time traffic and weather data to explore road accident likelihood and severity in urban arterials[J]. Journal of Safety Research, 2017(61): 9-21. http://www.onacademic.com/detail/journal_1000039838751910_a13a.html [24] ABDEL-ATY M, HASSAN H, AHMED M, et al. Real-time prediction of visibility related crashes[J]. Transportation Research Part C: Emerging Technologies, 2012(24): 288-298. http://www.sciencedirect.com/science/article/pii/S0968090X12000514 [25] LIU X L, XU J L, DONG Y P, et al. Defining highway node acceptance capacity (HNAC): Theoretical analysis and data simulation[J]. Journal of Advanced Transportation, 2020, 2020: 8939621. http://www.researchgate.net/publication/338613933_defining_highway_node_acceptance_capacity_hnac_theoretical_analysis_and_data_simulation [26] SHANG W Q, HUANG H K, ZHU H B, et al. A novel feature selection algorithm for text categorization[J]. Expert Systems with Applications, 2007, 33(1): 1-5. doi: 10.1016/j.eswa.2006.04.001 [27] 杨杰明. 文本分类中文本表示模型和特征选择算法研究[D]. 长春: 吉林大学, 2013.YANG J M. The research of text representation and feature selection in text categorization[D]. Changchun: Jilin University, 2013. (in Chinese)