5 CONCLUSIONS
As a crucial part of environmental perception, target
detection directly determines the safety and reliability
of a vehicle's understanding of its surrounding
environment and decision-making in intelligent
driving systems.
This article reviews the development history and
principles of intelligent driving target detection
algorithms from two-stage to single-stage target
detection and then to self-attention mechanism
detection method based on Transformer, with some
other methods for lightweight emerging in recent
years. It also compares and analyzes the core ideas,
performance and applicable scenarios of different
techniques.
In summary, with the continuous improvement of
algorithm theory and hardware computing power,
object detection technology has made remarkable
progress in intelligent driving. But it still needs
deeper research for greater precision, robustness and
immediacy to satisfy the greater requirement in
complex scenarios in the future. It is evident that
some excellent methods, such as Multimodal
perception, lightweight network structure, and few-
shot learning, are changing this field. Maybe one day
they can make a great surprise for all.
REFERENCES
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020).
Yolov4: Optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020).
An image is worth 16x16 words: Transformers for
image recognition at scale. arXiv preprint
arXiv:2010.11929.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the IEEE
conference on computer vision and pattern recognition
(pp. 580–587).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017).
Mask R-CNN. In Proceedings of the IEEE international
conference on computer vision (pp. 2961–2969).
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang,
W., Weyand, T., ... & Adam, H. (2017). Mobilenets:
Efficient convolutional neural networks for mobile
vision applications. arXiv preprint arXiv:1704.04861.
Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., ... & Dai,
J. (2024). Bevformer: Learning bird's-eye-view
representation from lidar-camera via spatiotemporal
transformers. IEEE Transactions on Pattern Analysis
and Machine Intelligence.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot
multibox detector. In Computer Vision–ECCV 2016
(pp. 21–37). Springer International Publishing.
Ma, L., Chen, Y., & Zhang, J. (2021, May). Vehicle and
pedestrian detection based on improved YOLOv4-tiny
model. In Journal of Physics: Conference Series (Vol.
1920, No. 1, p. 012034). IOP Publishing.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016).
You only look once: Unified, real-time object detection.
In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 779–788).
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-
CNN: Towards real-time object detection with region
proposal networks. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 39(6), 1137–1149.
Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., ...
& Luo, P. (2020). Transtrack: Multiple object tracking
with transformer. arXiv preprint arXiv:2012.15460.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention
is all you need. Advances in Neural Information
Processing Systems, 30.
Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023).
YOLOv7: Trainable bag-of-freebies sets new state-of-
the-art for real-time object detectors. In Proceedings of
the IEEE/CVF conference on computer vision and
pattern recognition (pp. 7464–7475).