交通场景中基于注意力机制神经网络的人群计数

王丽园; 姚韵涛; 贾洋; 肖进胜; 李必军

doi:10.3963/j.jssn.1674-4861.2023.06.012

交通场景中基于注意力机制神经网络的人群计数

doi: 10.3963/j.jssn.1674-4861.2023.06.012

1.
中交第二公路勘察设计研究院有限公司武汉 430056
2.
武汉大学电子信息学院武汉 430072
3.
四川省公路规划勘察设计研究院有限公司成都 610041
4.
武汉大学测绘遥感信息工程国家重点实验室武汉 430079

基金项目:

湖北省重点研发计划项目 2023BAB022

中国交通建设集团有限公司科技研发项目 2019-ZJKJ-ZDZX02

详细信息

作者简介:
王丽园（1980—），硕士，正高级工程师. 研究方向：图像与视频处理、智慧交通技术. Email: 13397123890@126.com

通讯作者:
肖进胜（1975-），博士，副教授，研究方向：图像与视频处理. Email: xiaojs@whu.edu.cn

中图分类号: TP29
计量
- 文章访问数: 234
- HTML全文浏览量: 126
- PDF下载量: 18
- 被引次数: 0
出版历程
- 收稿日期: 2023-08-18
- 网络出版日期: 2024-04-03

Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes

1.
CCCC Second Highway Consultants Co., LTD, Wuhan 430056, China
2.
School of Electronic Information, Wuhan University, Wuhan 430072, China
3.
Sichuan Highway Planning, Survey, Design and Research Institute Co., LTD., Chengdu 610041, China
4.
State Key Laboratory of Information Engineering in Surveying, mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

摘要

摘要: 人群计数是计算机视觉领域的重要任务。交通场景中的人群计数任务对于维护公众出行安全、实现交通智能化具有重要作用。公共交通场景中通常存在行人相互遮挡、背景复杂等现象，给人群计数带来了困难。为了实现高精度的人群计数，研究了基于注意力机制的人群密度估计网络。网络包含3个部分：特征提取模块通过生成多尺度的特征图，增强网络的特征表达能力，提高网络对行人大小变化的鲁棒性；注意力模块通过抑制背景噪声响应，强化人群特征响应，生成特征图中人群区域的概率分布，增强网络区分人群区域与背景区域的能力；密度估计模块在注意力机制的约束下指导网络回归高分辨率的人群密度图，提高网络对人群区域的敏感性。设计了基于背景感知的结构损失函数，能够降低模型的错误识别率，提高模型的计数准确率；采用多级监督机制指导网络进行学习，能够帮助梯度反向传播和减少过度拟合，进一步提高网络的人群计数精度。在公共数据集ShanghaiTech上进行了实验，实验结果表明：与目前最先进的算法相比，在ShanghaiTechA和ShanghaiTechB数据集上，平均绝对误差（mean absolute error，MAE）分别提高了2.4%和1.5%，均方误差（mean square error，MSE）分别提高了3.3%和0.9%，证明了提出的算法在人群拥挤和稀疏的场景中均有更好的准确性和鲁棒性。同时，在真实场景数据集上进行了实验，MAE＝7.7，MSE＝12.6，证明了提出的算法具有良好的实用性。
- 交通安全 /
- 人群计数 /
- 注意力机制 /
- 背景感知结构损失 /
- 多级监督机制
Abstract: Crowd count is an important task in computer vision. Crowd count task in traffic scenes plays a significant role in maintaining public traffic safety and achieving traffic intelligence. However, crowd count in public traffic scenes faces difficulties due to pedestrian occlusion and complex background. In order to achieve high accuracy crowd count, an attention-based crowd density estimation network is proposed. The network consists of three parts: a feature extraction module is designed to generate multi-scale feature maps, which can enhance the feature representation capability and improve the robustness to pedestrian scale variation of the network; an attention module is designed to suppress the background noise response and enhance the crowd feature response, generate the probability distribution of the crowd region in the feature map, which can enhance the ability of the network to distinguish the crowd region from the background region; a density estimation module is designed that guides the network to regress a high-resolution crowd density map under the constraint of attention mechanism, which can improve the sensitivity of the network to crowd regions. In addition, a background-aware structure loss function is designed to reduce the model false recognition rate and improve the model counting accuracy; meanwhile, a multi-level super-vision mechanism is adopted to guide the network for learning, which can help gradient back-propagation and reduce over-fitting, further improving the network's crowd count accuracy. Experiments are carried out on public dataset ShanghaiTech. Compared with the state-of-the-art algorithms, on ShanghaiTechA and ShanghaiTechB datasets, the mean absolute error (MAE) improves by 2.4% and 1.5%, and the mean square error (MSE) improves by 3.3% and 0.9%, respectively, which demonstrates the superior accuracy and robustness of the proposed algorithm in both crowded and sparse scenes. Experiments are also conducted on real scene dataset with MAE=7.7 and MSE=12.6, which proves the good applicability of the proposed algorithm.
- traffic safety /
- crowd count /
- attention mechanism /
- background-aware structure loss algorithm /
- multi-level supervision

HTML全文

图 1 本文方法的体系结构

Figure 1. The architecture of the proposed method

下载: 全尺寸图片幻灯片

图 2 注意力模块具体结构

Figure 2. Specific structure of attention module

下载: 全尺寸图片幻灯片

图 3 密度估计模块具体结构

Figure 3. Specific structure of density map estimation module

下载: 全尺寸图片幻灯片

图 4 模型在ShanghaiTech和UCF-QNRF上的估计密度图的可视化结果

Figure 4. Visualization of the estimated density map on ShanghaiTech and UCF-QNRF

下载: 全尺寸图片幻灯片

图 5 真实场景下的实验结果

Figure 5. Experimental results in real scenes

下载: 全尺寸图片幻灯片

图 6 注意力图的可视化结果

Figure 6. Visualization of the attention map

下载: 全尺寸图片幻灯片

图 7 有无注意力模块时的估计密度图可视化

Figure 7. Visualization of the estimated density map with and without the attention module

下载: 全尺寸图片幻灯片

表 1 在ShanghaiTech和UCF-QNRF上的性能比较

Table 1. Performance comparison on ShanghaiTech and UCF-QNRF

方法	ShanghaiTechA		ShanghaiTechB		UCF-QNRF
方法	MAE	MSE	MAE	MSE	MAE	MSE
MCNN^[15]	110.2	173.2	26.4	41.3	277.0	426.0
CSRNet^[16]	68.2	115.0	10.6	16.0
CAN^[17]	62.3	100.0	7.8	12.2	107.0	183.0
S-DCNet^[21]	58.3	95.0	6.7	10.7	104.4	176.1
Bayesian^[22]	62.8	101.8	7.7	12.7	88.7	154.8
本文方法	56.9	91.8	6.6	10.6	90.8	155.1

下载: 导出CSV

表 2 在ShanghaiTechB上的消融实验结果

Table 2. Ablation results on ShanghaiTechB ShanghaiTechB

网络	ShanghaiTechB
网络	MAE	MSE
无注意力模块	7.8	12.7
有注意力模块	6.8	10.6

下载: 导出CSV

参考文献(22)

[1]	张宇倩, 李国辉, 雷军, 等. FF-CAM: 基于通道注意机制前后端融合的人群计数[J]. 计算机学报, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm ZHANG Y Q, LI G H, LEI J, et al. FF-CAM: crowd counting based on frontend-backend fusion through channel-attention mechanism[J]. Chinese Journal of Computers, 2021, 44 (2): 304-317. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm
[2]	杜培德, 严华. 基于多尺度空间注意力特征融合的人群计数网络[J]. 计算机应用, 2021, 41(2): 537-543. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202102035.htm DU P D, YAN H. Crowd counting network based on multi-scale spatial attention feature fusion[J]. Computer Applications, 2021, 41(2): 537-543. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202102035.htm
[3]	WANG Z, CHEN J, HOI S. Deep learning for image super-resolution: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3365-3387. doi: 10.1109/TPAMI.2020.2982166
[4]	LEIBE B, SEEMANN E, SCHIELE B. Pedestrian detection in crowded scenes[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR' 05), San Diego, CA, USA. IEEE, 2005.
[5]	LI M, ZJANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection[C]. 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA. IEEE, 2008.
[6]	CHEN K, LOY C C, GONG S, et al. Feature mining for localised crowd counting[C]. British Machine Vision Conference, Guildford, Surrey, UK. 2012, 1(2): 3.
[7]	LOWE D G. Object recognition from local scale-invariant features[C]. Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece. IEEE, 1999.
[8]	OJALA T, PIETIKAINEN M, MAENPAA T. Gray-scale and rotation invariant texture classification with local binary patterns[C]. Computer Vision-ECCV 2000: 6th European Conference on Computer Vision Dublin, Ireland. Springer, 2000.
[9]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE computer society conference on computer vision and pattern recognition(CVPR'05), San Diego, CA, USA. IEEE, 2005.
[10]	PARAGIOS N, RAMESH V. A MRF-based approach for real-time subway monitoring[C]. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001. IEEE, 2001.
[11]	TIAN Y, SIGAL L, BADINO H, et al. Latent gaussian mixture regression for human pose estimation[C]. Asian Conference on Computer Vision, Berlin, Heidelberg: Springer, 2010.
[12]	LEMPITSKY V, ZISSERMAN A. Learning to count objects in images[OL]. (2010-12-06)[2023-05-15]. https://www.robots.ox.ac.uk/~vgg/publications/2010/Lempitsky10b/lempitsky10b.pdf
[13]	PHAM V Q, KOZAKAYA T, YAMAGUCHI O, et al. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation[C]. IEEE International Conference on Computer Vision, Santiago, Chile: IEEE, 2015.
[14]	肖进胜, 申梦瑶, 江明俊, 等. 融合包注意力机制的监控视频异常行为检测[J]. 自动化学报, 2022, 48(12): 2953-2961. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202212007.htm XIAO J S, SHEN M Y, JIANG M J, et al. Abnormal behavior detection algorithm with video-bag attention mechanism in surveillance video[J]. Acta Automatica Sinica, 2022, 48 (12): 2953-2961. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202212007.htm
[15]	ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]. IEEE conference on computer vision and pattern recognition, Las Vegas, USA: IEEE, 2016.
[16]	LI Y, ZHANG X, CHEN D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA: IEEE, 2018.
[17]	LIU W, SALZMANN M, FUA P. Context-aware crowd counting[C]. Conference on Computer Vision and Pattern Recognition, Long Beach, USA: IEEE, 2019.
[18]	RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]. Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany: Springer, 2015.
[19]	RONG L, LI C. Coarse- and fine-grained attention network with background-aware loss for crowd density map estimation[C]. Winter Conference on Applications of Computer Vision(WACV), Waikoloa, USA: IEEE, 2021.
[20]	IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]. European Conference on Computer Vision (ECCV), Munich, Germany: IEEE, 2018.
[21]	XIONG H, LU H, LIU C, et al. From open set to closed set: counting objects by spatial divide-and-conquer[C]. International Conference on Computer Vision(ICCV), Seoul, Korea(South): IEEE, 2019.
[22]	MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision[C]. International Conference on Computer Vision (ICCV), Seoul, Korea (South): IEEE, 2019. LIU T L, ZHANG C, WANG T G, et al. Effects of friends'information interaction on travel decisions[J]. Journal of Transportation Systems Engineering and Information Technology, 2013, 13(6): 86-93. (in Chinese)