Authors:
K. S. Chidanand Kumar
1
and
Samir Al-Stouhi
2
Affiliations:
1
Great Wall of Motors, Whitefield, Bangalore, Karnataka, India
;
2
American Haval Motors, Michigan, U.S.A.
Keyword(s):
Bird’s-Eye-View (BEV), Convolutional Neural Network (CNN), Non-Local Context Network (NLCN), YOLO, Convolutional LSTM (CLSTM), Spatial-Temporal Context Network (STCN).
Abstract:
This paper proposes a real-time spatial-temporal context approach for BEV object detection and classification using LiDAR point-clouds. Current state-of-art BEV object-detection approaches focused mainly on single-frame point-clouds while the temporal factor is rarely exploited. In current approach, we aggregate 3D LiDAR point clouds over time to produce a 4D tensor, which is then fed to a one-shot fully convolutional detector to predict oriented 3D object bounding-box information along with object class. Four different techniques are evaluated to incorporate the temporal dimension; a) joint training b) CLSTM c) non-local context network (NLCN) d) spatial-temporal context network (STCN). The experiments are conducted on large-scale Argoverse dataset and results shows that by using NLCN and STCN, mAP accuracy is increased by a large margin over single frame 3D object detector and YOLO4D 3D object detection with our approach running at a speed of 28fps.