Convolutional Neural Networks implementation to
learning how to perform the best calculation
dynamically by spontaneously adjusting the activity
of neurons on a given section of neurons, while the
performance of FPGA is improved by hardware-
aware optimizations such as quantization, pruning as
well as parallel MAC operations. HALO-YOLO the
results were validated through an experimentation
approach showed that it was not only able to preserve
high detection accuracy (mAP@0.5 = 83.7%) during
the comparison with traditional GPU-based models in
the area of energy efficiency but it was also able to
accomplish it with almost 95% less power (95%
power reduction) of the energy, real-time processing
speed (2× faster inference than YOLOv5) more than
the latter. The SMD operation on FPGA has
demonstrated the capability of low-power AI
accelerators in many cases like, a robot doing regular
tasks, the applications in the industrial, and electrical
vision systems, thus the path for the further
development of yaw-aware deep learning techniques
REFERENCES
A. R. Pathak, M. Pandey, and S. Rautaray, “Application of
deep learning for object detection,” Proc. Comput. Sci.,
vol. 132, pp. 1706–1717, Jan. 2018.
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao,
“YOLOv4: Optimal speed and accuracy of object
detection,” 2020, arXiv:2004.10934.
B. Jacob et al., “Quantization and training of neural
networks for efficient integer-arithmetic-only
inference,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit., Jun. 2018, pp. 2704–2713.
B. Martinez, J. Yang, A. Bulat, and G. Tzimiropoulos,
“Training binary neural networks with real-to-binary
convolutions,” in Proc. Int. Conf. Learn. Represent.
(ICLR), 2020, pp. 1–11. [Online]. Available:
https://openreview.net/forum?id=BJg4NgBKvH
C. Szegedy et al., “Going deeper with convolutions,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2015, pp. 1–9.
D. T. Nguyen, T. N. Nguyen, H. Kim, and H. J. Lee, “A
high-throughput and power-efficient FPGA
implementation of YOLO CNN for object detection,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.
27, no. 8, pp. 1861–1873, Aug. 2019.
F. Conti et al., “XNOR neural engine: A hardware
accelerator IP for 21.6-fJ/op binary neural network
inference,” IEEE Trans. Comput.-Aided Design Integr.
Circuits Syst., vol. 37, no. 11, pp. 2940–2951, Nov.
2018.
H. Nakahara, H. Yonekawa, T. Fujii, and S. Sato, “A
lightweight YOLOv2: A binarized CNN with a parallel
support vector regression for an FPGA,” in Proc.
ACM/SIGDA Int. Symp. Field-Program. Gate Arrays,
Feb. 2018, pp. 31–40.
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y.
Bengio, “Binarized neural networks,” Proc. Adv.
Neural Inf. Process. Syst. (NIPS), vol. 29, 2016, pp.1–
9.
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized
convolutional neural networks for mobile devices,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 4820–4828.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You
only look once: Unified, real-time object detection,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 779–788.
J. Redmon and A. Farhadi, “YOLO9000: Better, faster,
stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jul. 2017, pp. 7263–7271.
J. Redmon and A. Farhadi, “YOLOv3: An incremental
improvement,” 2018, arXiv:1804.02767.
J. Bethge, H. Yang, M. Bornstein, and C. Meinel,
“BinaryDenseNet: Developing an architecture for
binary neural networks,” in Proc. IEEE/CVF Int. Conf.
Comput. Vis. Workshop (ICCVW), Oct. 2019, pp.
1951–1960.
J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo,
“UNPU: An energy-efficient deep neural network
accelerator with fully variable weight bit precision,”
IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 173–
185, Jan. 2019.
K. Ando et al., “BRein memory: A single-chip
binary/ternary reconfigurable in-memory deep neural
network accelerator achieving 1.4 TOPS at 0.6 W,”
IEEE J. Solid-State Circuits, vol. 53, no. 4, pp. 983–
994, Apr. 2018.
M. Horowitz, “1.1 Computing’s energy problem (and what
we can do about it),” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2014, pp. 10–
14.
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi,
“XNOR-Net: ImageNet classification using binary
convolutional neural networks,” in Proc. Eur. Conf.
Comput. Vis. (ECCV). Cham, Switzerland: Springer,
2016, pp. 525–542.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C.
Chen, “MobileNetV2: Inverted residuals and linear
bottlenecks,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit., Jun. 2018, pp. 4510–4520.
M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and
efficient object detection,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp.
10781–10790.
O. Russakovsky et al., “ImageNet large scale visual
recognition challenge,” Int. J. Comput. Vis., vol. 115,
no. 3, pp. 211–252, Dec. 2015.
Q. Huang et al., “CoDeNet: Efficient deployment of input-
adaptive object detection on embedded FPGAs,” in
Proc. ACM/SIGDA Int. Symp. Field-Program. Gate
Arrays, Feb. 2021, pp. 206–216.
R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf.
Comput. Vis. (ICCV), Dec. 2015, pp. 1440–1448.