Detection and an Analysis of Redundant RFID Traffic Data
-
摘要: 数据质量控制是智能交通系统应用建设的关键技术之一。基于对射频识别(RFID)数据特性的分析,将 RFID 冗余数据分为重复数据和相似数据,通过分析同一车辆的相邻过车时间来检测2类冗余数据。针对相似数据给出了冗余率曲线和冗余时间点的定义,解决了 RFID 交通数据中冗余数据的识别问题。针对2类冗余数据的特点,给出了2类冗余率的计算方法,提出了从基站和冗余率曲线走势2个角度出发对冗余率进行分析的方法,并给出了冗余数据的清洗方法。选取南京市区主干道上21个 RFID 基站的原始数据作为实例,对所提出的方法进行了验证。研究结果表明,21个基站采集重复数据的平均冗余率为0.0062%,相似数据的平均冗余率为0.92%,说明 RFID 数据采集技术采集到的数据具有较高可靠性。同时,各个基站采集的数据中相似数据数量远远多于重复数据数量。观察不同形状的冗余率曲线发现,冗余率曲线呈趋于平缓和尾部上升的基站冗余率较高;冗余率曲线呈直线上升的基站冗余率较低。针对分析结果,给出了相应的质量控制措施以控制 RFID 冗余数据的产生。Abstract: Data quality control is one of the key technologies for intelligent transportation systems.Radio Frequency Identification (RFID)data generally contain redundancy.According to the different characteristics,they can be broadly divided into two types:duplicate data and similar data.Redundancy detection is based on an analysis of the adjacent time for one vehicle.To identify the redundant RFID data,the curve of redundancy rate and time points of redundant data are extracted.Due to different characteristics of the redundancy types,their redundancy rates are computed separately.A de-tection algorithm is proposed,and applied to analyze the redundancy rate of RFID data in two aspects:RFID stations and shapes of redundancy curves.Moreover,a cleansing method for redundant data is also proposed.As a case study,raw RFID data are collected from 21 RFID stations on the main road in the City of Nanjing.The results show that the average rate of duplicate data is 0.006 2%;which of similar data is 0.92%.Moreover,in each RFID station,the amount of similar data is much larger than that of duplicate data.From the shape-of-redundancy-curve point of view,it is observed that the leveling off or tail rising curves are related to the stations with high redundancy rates;while the straight up curves imply low redundancy rates.Based on the analysis,several measures are proposed to control redundant RFID data.
点击查看大图
计量
- 文章访问数: 196
- HTML全文浏览量: 22
- PDF下载量: 3
- 被引次数: 0