Abstract: In our submission to the NVIDIA AI City Challenge, we address speed measurement of vehicles and vehicle re-identification. For both these tasks, we use a calibration method based on extracted vanishing points. We detect and track vehicles by a CNN-based detector and we construct 3D bounding boxes for all vehicles. For the speed measurement task, we estimate the speed from the movement of the bounding box in the 3D space using the calibration. Our approach to vehicle re-identification is based on extraction of visual features from “unpacked” images of the vehicles. The features are aggregated in temporal domain to obtain a single feature descriptor for the whole track. Furthermore, we utilize a validation network to improve the re-identification accuracy.
Holistic Recognition of Low Quality License Plates by CNN using Track Annotated Data [IWT4S-AVSS 2017]Abstract: This work is focused on recognition of license plates in low resolution and low quality images. We present a methodology for collection of real world (non-synthetic) dataset of low quality license plate images with ground truth transcriptions. Our approach to the license plate recognition is based on a Convolutional Neural Network which holistically processes the whole image, avoiding segmentation of the license plate characters. Evaluation results on multiple datasets show that our method significantly outperforms other free and commercial solutions to license plate recognition on the low quality data. To enable further research of low quality license plate recognition, we make the datasets publicly available.
BoxCars: Improving Fine-Grained Recognition of Vehicles using 3D Bounding Boxes in Traffic Surveillance [IEEE T-ITS 2018]Abstract: In this paper, we focus on fine-grained recognition of vehicles mainly in traffic surveillance applications. We propose an approach orthogonal to recent advancement in fine-grained recognition (automatic part discovery, bilinear pooling). Also, in contrast to other methods focused on fine-grained recognition of vehicles, we do not limit ourselves to frontal/rear viewpoint but allow the vehicles to be seen from any viewpoint. Our approach is based on 3D bounding boxes built around the vehicles. The bounding box can be automatically constructed from traffic surveillance data. For scenarios where it is not possible to use the precise construction, we propose a method for estimation of the 3D bounding box. The 3D bounding box is used to normalize the image viewpoint by unpacking the image into plane. We also propose to randomly alter the color of the image and add a rectangle with random noise to random position in the image during training Convolutional Neural Networks. We have collected a large fine-grained vehicle dataset BoxCars116k, with 116k images of vehicles from various viewpoints taken by numerous surveillance cameras. We performed a number of experiments which show that our proposed method significantly improves CNN classification accuracy (the accuracy is increased by up to 12 percent points and the error is reduced by up to 50% compared to CNNs without the proposed modifications). We also show that our method outperforms state-of-the-art methods for fine-grained recognition.
Traffic Surveillance Camera Calibration by 3D Model Bounding Box Alignment for Accurate Vehicle Speed Measurement [CVIU]Abstract: In this paper, we focus on fully automatic traffic surveillance camera calibration which we use for speed measurement of passing vehicles. We improve over a recent state-of-the-art camera calibration method for traffic surveillance based on two detected vanishing points. More importantly, we propose a novel automatic scene scale inference based on matching bounding boxes of rendered 3D models of vehicles with detected bounding boxes in the image. The proposed method can be used from an arbitrary viewpoint and it has no constraints on camera placement. We evaluate our method on recent comprehensive dataset for speed measurement BrnoCompSpeed. Experiments show that our automatic camera calibration by detected two vanishing points method reduces the error by 50% compared to the previous state-of-the-art method. We also show that our scene scale inference method is much more precise (mean speed measurement error 1.10km/h) outperforming both state of the art automatic calibration method (error reduction by 86% — mean error 7.98km/h) and manual calibration (error reduction by 19% — mean error 1.35km/h). We also present qualitative results of automatic camera calibration method on video sequences obtained from real surveillance cameras on various places and under different lighting conditions (night, dawn, day).
Abstract: In this paper, we focus on traffic camera calibration and a visual speed measurement from a single monocular camera, which is an important task of visual traffic surveillance. Existing methods addressing this problem are difficult to compare due to a lack of a common data set with reliable ground truth. Therefore, it is not clear how the methods compare in various aspects and what factors are affecting their performance. We captured a new data set of 18 full-HD videos, each around 1 hr long, captured at six different locations. Vehicles in the videos (20865 instances in total) are annotated with the precise speed measurements from optical gates using LiDAR and verified with several reference GPS tracks. We made the data set available for download and it contains the videos and metadata (calibration, lengths of features in image, annotations, and so on) for future comparison and evaluation. Camera calibration is the most crucial part of the speed measurement; therefore, we provide a brief overview of the methods and analyze a recently published method for fully automatic camera calibration and vehicle speed measurement and report the results on this data set in detail.
Abstract: This paper proposes an approach to the vehicle reidentification problem in a multiple camera system. We focused on the re-identification itself assuming that the vehicle detection problem is already solved including extraction of a full-fledged 3D bounding box. The re-identification problem is solved by using color histograms and histograms of oriented gradients by a linear regressor. The features are used in separate models in order to get the best results in the shortest CPU computation time. The proposed method works with a high accuracy (60 % true positives retrieved with 10 % false positive rate on a challenging subset of the test data) in 85 milliseconds of the CPU (Core i7) computation time per one vehicle re-identification assuming the fullHD resolution video input. The applications of this work include finding important parameters such as travel time, traffic flow, or traffic information in a distributed traffic surveillance and monitoring system.
Abstract: We are dealing with the problem of fine-grained vehicle make&model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itself - and feeding it into the deep convolutional neural network boosts the recognition performance considerably. This additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.
Unsupervised Processing of Vehicle Appearance for Automatic Understanding in Traffic Surveillance [DICTA 2015]Abstract: This paper deals with unsupervised collection of information from traffic surveillance video streams. Deployment of usable traffic surveillance systems requires minimizing of efforts per installed camera - our goal is to enroll a new view on the street without any human operator input. We propose a method of automatically collecting vehicle samples from surveillance cameras, analyze their appearance and fully automatically collect a fine-grained dataset. This dataset can be used in multiple ways, we are explicitly showcasing the following ones: fine-grained recognition of vehicles and camera calibration including the scale. The experiments show that based on the automatically collected data, make&model vehicle recognition in the wild can be done accurately: average precision 0.890. The camera scale calibration (directly enabling automatic speed and size measurement) is twice as precise as the previous existing method. Our work leads to automatic collection of traffic statistics without the costly need for manual calibration or make&model annotation of vehicle samples. Unlike most previous approaches, our method is not limited to a small range of viewpoints (such as eye-level cameras shots), which is crucial for surveillance applications.
Abstract: We propose a method for fully automatic calibration of traffic surveillance cameras. This method allows for calibration of the camera - including scale - without any user input, only from several minutes of input surveillance video. The targeted applications include speed measurement, measurement of vehicle dimensions, vehicle classification, etc. The achieved mean accuracy of speed and distance measurement is below 2%. Our efficient C++ implementation runs in real time on a lowend processor (Core i3) with a safe margin even for full-HD videos.
Abstract: This paper deals with automatic calibration of roadside surveillance cameras. We focus on parameters necessary for measurements in traffic surveillance applications. Contrary to the existing solutions, our approach requires no a priori knowledge and it works with a very wide variety of road settings (number of lanes, occlusion, quality of ground marking), and with practically unlimited viewing angles. The main contribution is that our solution works fully automatically - without any per-camera or per-video manual settings or input whatsoever - and it is computationally cheap. Our approach uses tracking of local feature points and analyzes the trajectories in a manner based on Cascaded Hough Transform and parallel coordinates. An important assumption for the vehicle movement is that at least a part of the vehicle motion is approximately straight -- we discuss the impact of this assumption on the applicability of our approach and show experimentally, that this assumption does not limit the usability of our approach severely.
Abstract: Detection of vehicles in traffic surveillance needs good and large training datasets in order to achieve competitive detection rates. We are showing an approach to automatic synthesis of custom datasets, simulating various major influences: viewpoint, camera parameters, sunlight, surrounding environment, etc. Our goal is to create a competitive vehicle detector which “has not seen a real car before.” We are using Blender as the modeling and rendering engine. A suitable scene graph accompanied by a set of scripts was created, that allows simple configuration of the synthesized dataset. The generator is also capable of storing rich set of metadata that are used as annotations of the synthesized images. We synthesized several experimental datasets, evaluated their statistical properties, as compared to real-life datasets. Most importantly, we trained a detector on the synthetic data. Its detection performance is comparable to a detector trained on state-of-the-art real-life dataset. Synthesis of a dataset of 10,000 images takes only several hours, which is much more efficient, compared to manual annotation, let aside the possibility of human error in annotation.
3rd Best Paper Award
Abstract: This paper presents a fully automated system for traffic surveillance which is able to count passing cars, determine their direction, and the lane which they are taking. The system works without any manual input whatsoever and it is able to automatically calibrate the camera by detecting vanishing points in the video sequence. The proposed system is able to work in real time and therefore it is ready for deployment in real traffic surveillance applications. The system uses motion detection and tracking with the Kalman filter. The lane detection is based on clustering of trajectories of vehicles. The main contribution is a set of filters which a track has to pass in order to be treated as a vehicle and the full automation of the system.
Abstract: A system for traffic analysis was designed and implemented during work on this thesis. The system is able to detect, track and classify vehicles. Also, the system is able to detect lanes or determine whether a vehicle is passing in wrong way. The speed of observed vehicles is also measured. The system does not require any manual input or calibration whatsoever as the video camera is fully automatically calibrated by detected vanishing points. The accuracy of the detection, tracking and classification is high and the speed of vehicles is measured with a low error. The system runs in real time and it is currently used for a continuous monitoring of traffic. The main contribution of the thesis is the fully automated speed measurement of passing vehicles.