On the Fly Vehicle Modeling and Tracking

with 2D-LiDAR Detector and Infrared Camera

Kazuhiko Sumi

, Kazunari Takagi

, Tatsuya Oshiro

, Takuya Matsumoto

, Kazuyoshi Kitajima

Yoshifumi Hayakawa

and Masayuki Yamamoto

Aoyama Gakuin University, Fuchinobe Honmachi, Sagamihara, Japan

Sohatsu System Laboratory, Kobe, Japan

Mitsubishi Heavy Industry Machinery Systems LTD, Kobe, Japan

Keywords:

Vehicle Detection, Vehicle Tracking, Localization, Infra-Red, Lidar.

Abstract:

We propose a vehicle detection and tracking system that tracks vehicles from the rear using a 10-band infrared

(IR) surveillance camera installed along the expressway. The main reason for using an infrared camera is to

suppress the strong light reﬂections of head and tail lights of the vehicle at rainy night. However, due to lack

of a large IR trafﬁc video datasets covering all type of vehicles, we will not be able to take advantages of

recent machine learning advances. Therefore, we propose rather straight approach to detect vehicles by a pair

of 2D-LiDARs, then generate the image model of vehicle to be tracked on the ﬂy. We prototyped the system

and evaluated it with a normal trafﬁc video taken on a highway. We achieved a 94% tracking success rate at a

distance of 20m to 70m from the camera and mean error of localization is less than 2m at 70m.

1 INTRODUCTION

In Japan, in order to support driving operations where

autonomous vehicles join on highways, the position

and speed of traveling vehicles on the main line are

recognized as road infrastructure, and on the main

road A service is planned to provide information on

the position and speed of the vehicle before entering

the sensor ﬁeld of the autonomous vehicle. (See Fig-

ure 1) Several methods are conceivable for realizing

the vehicle localization in this system. The ﬁrst op-

tion is sensor selection: (a) camera, (b) LiDAR, or

sor location: (e) measuring a vehicle from front or

(f) tracking from the rear. The third is algorithm and

object modeling. For example, if a visible camera is

chosen, various algorithms and object models are pro-

posed (Al-Smadi et al., 2016).

Regards to the sensor selection, LiDAR has an ad-

vantage that the distance of the point scanned can be

directly measured. 3D-LiDAR is the most popular

sensor for autonomous driving. On the other hand, for

this application, LiDAR has disadvantages in measur-

ing a vehicle behind another one due to its sparse spa-

tial sampling. On the other hand, camera is suitable

https://orcid.org/0000-0002-9165-5912

Figure 1: Concept of autonomous vehicle support system at

expressway junction.

because camera has the best spatial resolution and this

enables recognizing each vehicle in a row of consec-

utive cars. The next question is the detection algo-

rithm. So far, successful algorithms includes back-

ground subtraction (Seki et al., 2003), optical ﬂow

(Chen and Wu, 2015), HOG like features (Wei et al.,

2019), and deep learning based approach (Scheideg-

ger et al., 2018).

Currently, most of the surveillance cameras in-

stalled on the road use RGB cameras to make it easier

for humans to observe. However, RGB camera im-

ages are extremely hard to be correctly segmented by

810

Sumi, K., Takagi, K., Oshiro, T., Matsumoto, T., Kitajima, K., Hayakawa, Y. and Yamamoto, M.

On the Fly Vehicle Modeling and Tracking with 2D-LiDAR Detector and Infrared Camera.

DOI: 10.5220/0009105608100813

In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 4: VISAPP, pages

810-813

ISBN: 978-989-758-402-2; ISSN: 2184-4321

Figure 2: Comparison of day and night image in RGB cam-

era and IR camera. From the left to right: RGB daytime,

RGB night, IR daytime, IR night. Note that appearance

of RGB image changes drastically between night and day.

Vehicle segmentation becomes extremely hard in the rainy

night.

computers (Bahnsen and Moeslund, 2018). So, we

decided to use a 10m band IR camera this time. Us-

ing an IR camera eliminates the effects of strong light

reﬂections on the wet surface as well as suppressing

the direct light from the vehicle’s head and rear lights.

However, this time, the following technical issues

arise.

• Compared to RGB images, IR images reﬂect ve-

hicle heat at the road surface, so afterglow is ob-

served around the vehicle. Therefore, it is difﬁcult

to separate only the vehicle by background differ-

ence.

• IR cameras are not often installed on the road, and

IR image learning data sets cannot be obtained.

Therefore, the method of learning HOG features

(requires thousands to tens of thousands of anno-

tated image data) and Deep Learning methods (re-

quires tens of thousands to millions of annotated

image data) cannot be obtained, and the detector I

can’t learn.

• On actual roads, there are cases that cannot be

covered by standard data sets, such as variations

in truck loading, towed trailers, and rarely used

special vehicles.

• Compared to RGB images, IR images have many

components with low spatial frequency, so there

are fewer key points for image feature descrip-

tors such as SIFT and ORB. Therefore, keypoint-

based tracking is difﬁcult to perform.

• The IR image is a 1-band image, and a color his-

togram cannot be created. Therefore, it is difﬁcult

to apply Mean-shift and MCMC, which are track-

ing methods featuring color histograms.

In order to overcome these problems, we make the

following proposals.

1. Vehicle tracking is performed using IR images.

The input image is a 1-band IR image, and the

vehicle model is an image model obtained by cut-

ting out the rear end region of the vehicle from the

IR image taken on the spot.

2. Tracking in successive images is done by search-

ing for the point where the normalized cross-

correlation with the vehicle model image is maxi-

mized.

3. To accurately determine the rear end of the ve-

hicle, measure the width, height, length, and

speed of the vehicle using two 2D-LiDARs. After

passing through 2D-LiDAR, which frame Predict

whether the rear end of the vehicle will appear at

the position.

Figure 3: Concept of our vehicle detection and tracking sys-

tem — arrangement of sensors.

2 VEHICLE DETECTION

The geometrical layout of 2D-LiDAR, roads and ve-

hicles is shown in the ﬁgure 3. Here, one LiDAR ac-

quires the cross-sectional shape of the road and the

vehicle by a laser beam that rotates and scans from the

center of the sensor as shown in the ﬁgure 4. In the

ﬁgure, the reference position of vehicle i passing at

time t = t1 is the coordinates of the lower left corner

of the rear end of the ith vehicle. Vehicle width W

height H

are obtained from a single scan waveform

as shown in the middle of ﬁgure 4. Vehicle speed

along Y axis v

is obtained by v

= D/T

, where D

is the distance between two 2D-LiDARs. The vehicle

length L

is obtained by L

= DT

. In practice, the

vehicle is not a rectangular but has an uneven height

proﬁle. Therefore, the vehicle cross-sectional shape

and H

are measured at the rear end of the vehicle.

Figure 4: Principle of vehicle detection and measurement.

3 MODEL CREATION AND

TRACKING

Following the detection of the vehicle, the vehicle ap-

On the Fly Vehicle Modeling and Tracking with 2D-LiDAR Detector and Infrared Camera

811

pears in the IR camera, so a vehicle model for tracking

the vehicle is generated on the spot. In our proposed

system, tracking uses the cross-correlation peak be-

tween the vehicle model image and the image being

tracked, so the vehicle model is the area surrounding

the vehicle in the video. It should be noted here that

the appearance of the vehicle changes during track-

ing. The cause of the apparent change is a geometric

change due to the rotation of the vehicle and the per-

spective projection of the camera, and a change of the

maid pattern due to the optical environment change.

Our tracking target is assumed to be a straight high-

way, and the geometric change due to the rotation of

the vehicle is small, but in order to track from 20m

to 140m far from the camera, the geometric change

due to perspective projection is Must be considered.

In the case of an IR camera, the pixel value is not an

absolute amount of heat radiation, but is obtained as

a relative value with respect to the surrounding pixel

values, so that the gain and offset of the luminance

pattern are not unchanged. For this reason, the ve-

hicle model uses a pattern on the rear end surface of

the vehicle that is nearly perpendicular to the camera

optical axis with little change in shape in perspective

projection from near to far. Use of zero-mean normal-

ized cross-correlation has the advantage that changes

in gain and offset do not affect the correlation value.

Note that vehicle model (template image) should be

shrinked as the vehicle moves away.

q(u,v) =

∑

x,y

(x,y) · (I(u+ x,v+ y) −

∑

x,y

(x,y)

∑

x,y

(I(u + x, v+ y) −

(1)

where q(u,v), I(u,v),

I, T

(x,y) and

T are the simi-

larity score at image location u,v, the image value at

location u,v, the mean of I(u, v), the model (template)

value, and the mean of T(x,y) respectively. Note that

T is adjusted so as to

T = 0.

A new tracking location in the i-th image is given

by the following equation:

x,y = arg max

x,y

(2)

4 EXPERIMENTS AND

DISCUSSION

The proposed system was prototyped and its perfor-

mance was evaluated on a highway. The system was

setup over a four-lane highway in Kobe City’s Mi-

natojima, where sensors and cameras are temporarily

placed on a pedestrian bridge that crosses the road,

data is acquired and saved in a ﬁle system, and the

evaluation was done off-line. The tracking success

rate and the accuracy was veriﬁed. The IR cam-

era was a trafﬁc surveillance camera made by FLIR

(image resolution 640 x 480 pixels, wavelength 10

mm), and the 2D-LiDAR was make by SICK. Fig-

ure 5 shows the correspondence between the scene of

the test site as seen from the aerial photograph and

the image taken with the IR camera. From a few tens

of minutes of video taken on this site, 100 vehicles

were selected randomly, and the rear end position of

each vehicle in each frame was manually marked as

the ground truth.

Figure 5: Captured image of the test site and the corre-

sponding aerial photo. (taken from Google map).

Figure 8 shows the tracking success rate with re-

spect to the distance from the camera. Tracking suc-

cess is the apparent tracking (Figure, red line) where

the tracking is continued in the algorithm, and the true

success rate (Figure, blue line) being tracked within

a certain range of error from the ground truth. A

tracking failure of several percent was observed near

the distance from the camera exceeding 70m, and in-

creased to about 20% at 150m, which ended the track-

ing. From this, it can be concluded that this system

can track up to a distance of 70m from the vicinity of

the camera with high reliability. There is also a 3%

tracking failure from close to 70m. The reason is that

the initial model (template) was generated almost cor-

rectly, but the search failed in the next frame. This is

considered to be because the template magniﬁcation

prediction in the next frame has failed, and it is esti-

mated that there is an algorithmic problem in the por-

tion that predicts the template magniﬁcation through

the velocity measurement and map-to-camera coordi-

nate conversion matrix.

5 CONCLUSION AND FUTURE

WORK

Our proposal demonstrates the effectiveness of on-

the-ﬂy vehicle model generation with vehicle detec-

tion using LiDAR. So far, a data set that enables IR

vehicle image tracking only from images has not been

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

812

Figure 6: Error analysis in horizontal axis.

Figure 7: Error analysis in distance axis.

Figure 8: Successful tracking rate.

developed, and this method is considered to be op-

timal at present. However, there is an initial track-

ing failure rate of 3%, and tracking position accuracy

drops from around 70m, and at 150m, 17% of cases

where initial tracking is successful are considered to

be miss-tracked. The development of better methods

is required before practical use. Analysis of failure

cases reveals that the major cause of miss-tracking

is that the vehicle being tracked is concealed by the

following vehicle. We plan to develop more sophisti-

cated algorithms, such as a tracking method that pre-

dicts the vehicle position when it is completely con-

cealed, and a method that restores tracking when it

reappears. On the other hand, it is important to install

this system in the real environment as soon as pos-

sible and experience many cases. At present, prob-

lems that are not frequently understood should be dis-

covered and algorithms should be improved. Further-

more, this system can contributeto the creation of ma-

chine learning data sets. If LiDAR is installed at mul-

tiple locations, correct values can be automatically as-

signed in image tracking, and large-scale data sets can

be constructed using them. In the near future, it is ex-

pected that feature points suitable for IR image vehi-

cle detection and tracking can be learned with just a

camera without using LiDAR vehicle detection.

REFERENCES

Al-Smadi, M., Abdulrahim, K., and Abdul Salam, R.

(2016). Trafﬁc surveillance: A review of vision based

vehicle detection, recognition and tracking. 11:713–

726.

Bahnsen, C. H. and Moeslund, T. B. (2018). Rain removal

in trafﬁc surveillance: Does it matter? IEEE Trans-

actions on Intelligent Transportation Systems, pages

1–18.

Chen, Y. and Wu, Q. (2015). Moving vehicle detection

based on optical ﬂow estimation of edge. In 11th

International Conference on Natural Computation,

ICNC 2015, Zhangjiajie, China, August 15-17, 2015,

pages 754–758.

Scheidegger, S., Benjaminsson, J., Rosenberg, E., Krish-

nan, A., and Granstr¨om, K. (2018). Mono-camera 3d

multi-object tracking using deep learning detections

and PMBM ﬁltering. In 2018 IEEE Intelligent Vehi-

cles Symposium, IV 2018, Changshu, Suzhou, China,

June 26-30, 2018, pages 433–440.

Seki, M., Wada, T., Fujiwara, H., and Sumi, K. (2003).

Background subtraction based on cooccurence of im-

age variations. volume 2, pages II–65.

Wei, Y., Tian, Q., Guo, J., Huang, W., and Cao, J. (2019).

Multi-vehicle detection algorithm through combining

harr and hog features. Mathematics and Computers in

Simulation, 155:130 – 145. International Conference

on Mathematical Modeling and Computational Meth-

ods in Science and Engineering.

On the Fly Vehicle Modeling and Tracking with 2D-LiDAR Detector and Infrared Camera

813