
Figure 4 presents five examples of mask images
(I
masked
) under varying exposure settings, environ-
mental conditions (background), and camera configu-
rations, along with the corresponding input image (I)
and overlay image I
overlay
. Our approach successfully
extracted the UAV frame across all scenarios tested.
As illustrated in Fig. 4e, mask generation failed in
the absence of the artificial light source, highlight-
ing the sensitivity of the method to scene illumination.
However, as shown in Fig. 4d, the algorithm success-
fully extracted the UAV frame using the same camera
and setup, the only difference being the positioning
of the light source. This underscores the importance
of maintaining consistent and adequate lighting con-
ditions during mask generation to ensure reliable re-
sults.
4.3 Limitations
The generated segmentation masks are generally of
good quality, with relatively few false positives and
false negatives, although some noise or artifacts may
still be present. To further improve robustness, the al-
gorithm should be evaluated across a wider range of
frame types and color variations. In low-light condi-
tions, detection of the UAV frame can become more
challenging, which can occasionally lead to missed
detections.
4.4 Future Work
The proposed algorithm successfully generated masks
for the UAV frame in all evaluated scenarios. For fu-
ture work, we plan to explore the use of an unsuper-
vised Vision Transformer (ViT) to learn the structural
characteristics of different UAV frames and enable
automatic mask generation. Specifically, our goal is
to employ a self-supervised learning framework based
on a student-teacher architecture, which can learn ro-
bust representations of the frame structure without the
need for labeled data. This would further improve
the adaptability and scalability of the masking process
across varying UAV configurations.
5 CONCLUSIONS
This work introduced a novel method for automati-
cally detecting and masking the frame of a UAV in
the images of the camera onboard. By excluding
the structure of the UAV in the onboard camera, the
method reduces the risk of misclassification, prevent-
ing parts of the UAV from being interpreted as obsta-
cles or other agents in multi-UAV systems. This au-
tomation significantly enhances the scalability and ef-
ficiency of swarm deployment, addressing a task that
is otherwise labour-intensive and does not scale when
done manually. Leveraging camera-specific geomet-
ric heuristics and a k-d tree structure, the proposed
algorithm achieves accurate and efficient frame detec-
tion across varying UAV designs and camera config-
urations. Quantitative and qualitative results across
multiple platforms confirm the adaptability and ro-
bustness of the method in various operating condi-
tions. In general, the proposed approach improves the
safety of swarm navigation and onboard vision sys-
tems, while also streamlining the preparation and de-
ployment of multi-UAV systems.
ACKNOWLEDGEMENTS
This work was funded by CTU grant no
SGS23/177/OHK3/3T/13, by the Czech Science
Foundation (GA
ˇ
CR) under research project no.
23-07517S and by the European Union under the
project Robotics and advanced industrial production
(reg. no. CZ.02.01.01/00/22 008/0004590).
REFERENCES
Bartolomei, L., Teixeira, L., and Chli, M. (2023). Fast
multi-uav decentralized exploration of forests. IEEE
Robotics and Automation Letters, 8(9):5576–5583.
Chen, S., Yin, D., and Niu, Y. (2022). A survey of
robot swarms’ relative localization method. Sensors,
22(12).
Chung, S.-J., Paranjape, A. A., Dames, P., Shen, S., and
Kumar, V. (2018). A survey on aerial swarm robotics.
IEEE Transactions on Robotics, 34(4):837–855.
Funahashi, I., Yamashita, N., Yoshida, T., and Ikehara, M.
(2021). High reflection removal using cnn with de-
tection and estimation. In 2021 Asia-Pacific Signal
and Information Processing Association Annual Sum-
mit and Conference (APSIPA ASC), pages 1381–1385.
Hinniger, C. and R
¨
uter, J. (2023). Synthetic training data for
semantic segmentation of the environment from uav
perspective. Aerospace, 10(7).
Horyna, J., Kr
´
atk
´
y, V., Pritzl, V., B
´
a
ˇ
ca, T., Ferrante, E.,
and Saska, M. (2024). Fast swarming of uavs in
gnss-denied feature-poor environments without ex-
plicit communication. IEEE Robotics and Automation
Letters, 9(6):5284–5291.
Ji, D., Jin, W., Lu, H., and Zhao, F. (2024). Pptformer:
Pseudo multi-perspective transformer for uav segmen-
tation.
Li, H., Cai, Y., Hong, J., Xu, P., Cheng, H., Zhu, X., Hu,
B., Hao, Z., and Fan, Z. (2023). Vg-swarm: A vision-
based gene regulation network for uavs swarm behav-
Towards Scalable and Fast UAV Deployment
249