Authors:
Nawel Slimani
1
;
Imen Jdey
2
and
Monji Kherallah
3
Affiliations:
1
National School of Electronics and Telecommunications, Sfax University, Sfax, Tunisia
;
2
FUniversity of Sfax, ReGIM-Lab. REsearch Groups in Intelligent Machines (LR11ES48), Sfax, Tunisia
;
3
Faculty of Sciences of Sfax, Sfax University, Tunisia
Keyword(s):
Deep Learning, Classification, Remote Sensing, Computer Vision, Vision Transformer, Self-Attention Mechanism, Satellite Image.
Abstract:
This study introduces a transformative approach to satellite image classification using the Vision Transformer (ViT) model, a revolutionary deep learning method. Unlike conventional methods, ViT divides images into patches and employs self-attention mechanisms to capture intricate spatial dependencies, enabling the discernment of nuanced patterns at the patch level. This key innovation results in remarkable classification accuracy, surpassing 98% for SAT4 and SAT6 datasets. The study’s findings hold substantial promise for diverse applications, including urban planning, agriculture, disaster response, and environmental conservation. By providing a nuanced understanding of ViT’s impact on satellite imagery analysis, this work not only contributes insights into ViT’s architecture and training process but also establishes a robust foundation for advancing the field and promoting sustainable resource management through informed decision-making.