Authors:
            
                    Nawel Slimani
                    
                        
                                1
                            
                    
                    ; 
                
                    Imen Jdey
                    
                        
                                2
                            
                    
                     and
                
                    Monji Kherallah
                    
                        
                                3
                            
                    
                    
                
        
        
            Affiliations:
            
                    
                        
                                1
                            
                    
                    National School of Electronics and Telecommunications, Sfax University, Sfax, Tunisia
                
                    ; 
                
                    
                        
                                2
                            
                    
                    FUniversity of Sfax, ReGIM-Lab. REsearch Groups in Intelligent Machines (LR11ES48), Sfax, Tunisia
                
                    ; 
                
                    
                        
                                3
                            
                    
                    Faculty of Sciences of Sfax, Sfax University, Tunisia
                
        
        
        
        
        
             Keyword(s):
            Deep Learning, Classification, Remote Sensing, Computer Vision, Vision Transformer, Self-Attention Mechanism, Satellite Image.
        
        
            
                
                
            
        
        
            
                Abstract: 
                This study introduces a transformative approach to satellite image classification using the Vision Transformer (ViT) model, a revolutionary deep learning method. Unlike conventional methods, ViT divides images into patches and employs self-attention mechanisms to capture intricate spatial dependencies, enabling the discernment of nuanced patterns at the patch level. This key innovation results in remarkable classification accuracy, surpassing 98% for SAT4 and SAT6 datasets. The study’s findings hold substantial promise for diverse applications, including urban planning, agriculture, disaster response, and environmental conservation. By providing a nuanced understanding of ViT’s impact on satellite imagery analysis, this work not only contributes insights into ViT’s architecture and training process but also establishes a robust foundation for advancing the field and promoting sustainable resource management through informed decision-making.