Authors:
Andrei-Stelian Stan
;
Dan Popescu
and
Loretta Ichim
Affiliation:
National University of Science and Technology POLITEHNICA Bucharest, Bucharest, Romania
Keyword(s):
Neural Networks, Person Detection, Unmanned Aerial Vehicles, Detection Transformer, Vision Transformer.
Abstract:
The study introduces a novel object detection system that combines the strengths of two advanced deep learning models, the Detection Transformer (DETR) and the Vision Transformer (ViT), to enhance detection accuracy and robustness in unmanned aerial vehicle (UAV) applications. Both models were independently fine-tuned on the VisDrone dataset and then deployed in parallel, each processing the same input to leverage their advantages. DETR provides precise localization capabilities, particularly effective in crowded urban settings. At the same time, ViT excels at identifying objects at various scales and under partial occlusions, which is crucial for distant object detection. The fusion of their outputs is managed through a dynamic fusion algorithm, which adjusts the confidence scores based on contextual analysis and the characteristics of detected objects, resulting in a combined detection system that outperforms the individual models. The fused model significantly improved overall acc
uracy, achieving up to 90%, with a mean Average Precision (mAP50) of 85%, and a recall of 80%. These results underline the potential of integrating multiple transformer-based models to handle the complexities of UAV-based detection tasks, offering a robust solution that adapts to diverse operational scenarios and environmental conditions.
(More)