Deep Learning Analysis for Big Remote Sensing Image Classiﬁcation

Imen Chebbi

1,2

, Nedra Mellouli

Myriam lamolle

and Imed Riadh Farah

LIASD Laboratory, University of Paris 8, Paris, France

RIADI Laboratory, University Of Manouba, Manouba, Tunisia

Keywords:

Big Data, Deep Learning, Remote Sensing, Classiﬁcation, Spark, Tensorﬂow.

Abstract:

Large data remote sensing has various special characteristics, including multi-source, multi-scale, large scale,

dynamic and non-linear characteristics. Data set collections are so large and complex that it becomes difﬁcult

to process them using available database management tools or traditional data processing applications. In

addition, traditional data processing techniques have different limitations in processing massive volumes of

data, as the analysis of large data requires sophisticated algorithms based on machine learning and deep learn-

ing techniques to process the data in real time with great accuracy and efﬁciency. Therefore Deep learning

methods are used in various domains such as speech recognition, image classiﬁcations, and learning methods

in language processing. However, recent researches merged different deep learning techniques with hybrid

learning-training mechanisms and processing data with high speed. In this paper we propose a hybrid ap-

proach for RS image classiﬁcation combining a deep learning algorithm and an explanatory classiﬁcation

algorithm. We show how deep learning techniques can beneﬁt to Big remote sensing. Through deep learning

we seek to extract relevant features from images via a DL architecture. Then these characteristics are the entry

points for the MLlib classiﬁcation algorithm to understand the correlations that may exist between character-

istics and classes. This architecture combines Spark RDD image coding to consider image’s local regions,

pre-trained Vggnet and U-net for image segmentation and spark Machine Learning like random Forest and

KNN to achieve labeling task.

1 INTRODUCTION

Earth observation is a source of information in many

ﬁelds of application such as remote sensing, cartogra-

phy, aeronautics, etc. Over the years, the number of

ﬂoating sensors in space is growing, hence the prolif-

eration of captured images. The various spaceborne

and airborne sensors deliver a large number of earth

observation data every day so that we can observe its

different sides (M.Chi, 2016). Indeed, these data are

the main actor of big remote sensing data (BRSD) and

has at least these classic 4Vs : The volume, the veloc-

ity, the veracity and the variety(I. Chebbi, 2015).

Currently, the most active research area in ma-

chine learning (ML) is deep learning (DL). With its

increased processing power and advances in proces-

sors and as the quantity of remote sensing data keeps

rising, we are starting to talk about BRSD DL. It

comes to play a major role in providing big remote

sensing data analytic solutions for classiﬁcation and

clustering. Obviously, DL uses deep architectures

in order to deal with complex relationships between

the input data and the class label. In addition, DL

and ensemble-based architectures are the most pop-

ular and efﬁcient approaches for multi-source and

multi-temporal land cover classiﬁcation. They out-

perform traditional machine learning methods and

cover both optical images or radar images. DL al-

gorithms are shown better performance from hyper-

spectral and multispectral imagery such as extracting

land cover types (N.Kussul, 2017), pixel-based classi-

ﬁcation, semantic segmentation, or target recognition.

They learn features from the data, where in the bottom

level,low-level features are extracted from texture and

spectral information, and the output features are rep-

resented at the top level. In fact, DL has many well-

established deep architectures like Deep Belief Net-

work (DBN), Recurrent Neural Network(RNN), or

Convolutional Neural Network(CNN).In image pro-

cessing the most used architecture is Convolutional

Neural Networks(CNN). This architecture is multi-

layer network, composed of 2 stages: Feature extrac-

tor and classiﬁer. Many experiments have shown that

the performance of remote sensing(RS) image scene

classiﬁcation has been signiﬁcantly improved due to

the powerful feature representation learnt through dif-

Chebbi, I., Mellouli, N., lamolle, M. and Farah, I.

Deep Learning Analysis for Big Remote Sensing Image Classiﬁcation.

DOI: 10.5220/0008166303550362

In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019), pages 355-362

ISBN: 978-989-758-382-7

355

ferent DL architectures (X.Chen, 2014).

The most two devoted frameworks useful in deep

learning are TensorFlow and Apache Spark. Tensor-

Flow is an open source framework which is desig-

nated to ML and especially to DL. It has built-in sup-

port for DL and Neural Network (NN), so it makes

easy to assemble a network, assign parameters and

run the training process. Also, Tensorﬂow has a col-

lection of samples trainable mathematic functions that

are usefull for NN. Due to the large collection of ﬂex-

ible tools,TensorFlow is compatible with many vari-

ant of ML. In addition, TensorFlow (P.Goldsborough,

2016) uses CPUs and GPUs for computing and that’s

what make the compile time faster. Apache Spark is

also an open source parallel computing framework,

which has the advantage of MapReduce (I.Chebbi,

2018). It delivers ﬂexibility, scalability and speed to

meet the challenges of big Data. Spark integrates two

main libraries SQL for querying large and structured

data and MLlib involving main learning algorithms

and statistical methods (A.Gupta, 2017). Obviously,

MLlib is Spark’s open-source ML library which in-

cludes several efﬁcient functionalities for training. It

also supports different languages and provides a high-

level API that rich Spark’s ecosystem and facilitate

the development of ML pipelines (X.Meng, 2016).

The purpose of our work is to combine the perfor-

mance of two CNN models in analyzing and classify-

ing heterogeneous multi-source remote sensing data.

We decide to use four different datasets according to

image acquisition, image resolution and image encod-

ing in the perspective to evaluate the improvement of

these two architectures.

The rest of this paper is organized as follows. In

Section II, we introduce the related work. Section

III we present the proposed approach. Section IV

presents our results. A brief conclusion with recom-

mendations for future studies is presented in Section

2 RELATED WORK

Deep learning is taking off in remote sensing and

many papers have been released talking about mul-

tiple applications of deep learning in remote sens-

ing. As DL a deep feature learning architecture,

it can learn semantic discriminative features and

reach better classiﬁcation compared with mid-level

approaches. In (N.Kussul, 2017) a multilevel deep

learning architecture is proposed using multitemporal

images acquired by Landsat-8 and Sentinel-1A satel-

lites. The proposed architecture is a four-level ar-

chitecture, including preprocessing (level 1), super-

vised classiﬁcation (level 2), post processing (level 3)

and ﬁnal geospatial analysis (level 4). 1-D and 2-D

CNNs architectures are proposed to explore spectral

and spatial features. At the preprocessing level, Self-

organizing Kohenen maps (SOMs) are chosen for

optical image segmentation and restoration of miss-

ing data. SOMs are trained for each spectral band

separately. The restoration of the missing values is

done by substituting input sample missing compo-

nents with neuron’s weight coefﬁcient. The restored

pixels are lately masked. In the second level (Super-

vised Classiﬁcation with CNN), two different CNN

architectures are compared: 1-D CNN in which con-

volutions are in the spectral domain and 2-D CNN in

which convolutions is in the spatial domain. These

two architectures are composed of two convolutions

layers, max pooling layer and two fully connected

layers. The two architectures use different train ﬁl-

ters and different number of neurons in the hidden

layers. A combination of AdaGrad and RMSProp

are used for moment estimation and it proved better

performance in term of fast convergence comparing

to gradient descent or stochastic gradient descent. In

the ﬁnal level, (Postprocessing and Geospatial Anal-

ysis)several ﬁltering algorithms have been developed

and they are based on the information quality of the

input data and ﬁeld boundaries. It takes a pixel based

classiﬁcation map. Finally, the data fusion allows the

interpretations of the classiﬁcation methods. Another

work proposed by Nguyen et al. (T.Nguyen, 2013)

in which they considered a method for satellite im-

age classiﬁcation using CNN architecture. First of all,

they converted the input image to gray and resize it

They used 3 convolutional layers and two sampling

layers. The ﬁrst convolutional layer executes the con-

volution operation using ﬁve kernels 5*5 and ﬁve bi-

ases to produce ﬁve maps. The next layer is a sam-

pling layer it computes spatial subsampling for each

map of the ﬁrst convolutional layer. The number of

maps of the convolutional layer is equal to the num-

ber of maps in the sampling layer. Five weights and

ﬁve biases are used in order to provide 5 maps. In the

second convolutional layer, two maps are convolved

in the second sampling layer to calculate a single map

and Therefore 10 maps are utilized in the third convo-

lutional layer. By the same way, the second sampling

layer has 10 maps. The ﬁnal convolutional layer uses

10*10 kernel maps where each map was convolved

with the previous sampling layer maps and we obtain

100 maps with 1*1 sizes. This layer is fully con-

nected with the output layer which provide a vector

whose dimension is equal to the number of classes

(T.Nguyen, 2013).

In (G.Cheng, 2018) features are used directly

KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval

356

from the FCN(Fully Convolutional Network) as the

classiﬁer inputs. In contrast Duan et al. (Y.Duan,

2017) proposed an approach in which CNN pooling

layer is substituted with a wavelet constrained pool-

ing layer. This layer is used in conjunction with

Markov Random Field and superpixel in order to pro-

vide a segmentation map. In (J. Geng and Chen,

2015) Geng et al. Used deep convolutional autoen-

coders (DCAE) for extracting features and automatic

classiﬁcation on high resolution single polarization

TerraSAR-X images. The architectures of the DCAE

contains a convolutional hand-crafted ﬁrst layer, in

which there are kernels, and a scale transformation

hand-crafted second layer, in which the correlated

neighbor pixel is integrated. The other layers are

trained with SAE (Stacked autoencoder). In fact, Xi-

aorui Ma et al (X.Ma, 2017) proposed a classiﬁca-

tion approach based on three decisions: the ﬁrst de-

cision is a local decision. A hyperspectral image will

be sampled and the test sample is based on its neigh-

borhood by calculating the Euclidean distance. The

second decision, is a global decision based on a su-

pervised classiﬁcation. It calculates an Euclidean dis-

tance between the sample and the classes. The ﬁnal

decision, is a self-decision. It is based on the label

class involving spectral and spatial features. The ﬁrst

two decisions are applied to unlabeled samples in the

training set. After that, the deep network is trained on

the new training set to extract features and a classiﬁ-

cation map is generated from the self-decision. Ob-

viously, the most common challenge of RS applica-

tions is RS image classiﬁcation. In fact, RS images

can have similar appearance but it belongs to differ-

ent classes. Indeed, in the few recent years the DL

approaches comes as a solution to this challenge. DL

is proving that it has efﬁcient results in hyperspectral

and multispectral BRSD imagery in land cover types

such as extracting forests, buildings, roads.

As the DL approaches are taking off big data and re-

mote sensing. In our paper, we are going to use DL

in BRSD classiﬁcation by identifying and classify-

ing objects in satellite images and executing DL algo-

rithms based on two CNN models (vggnet and U-net).

From the state of the art, the vggnet architec-

ture appears as the network devoted to feature extrac-

tion tasks. This network receives an error of about

8.5% on the ImageNet Large Scale Visual Recogni-

tion Competition (ILSVRC). This is about 1% more

than the 19-layer version, but in the interest of eas-

ier handling and computation speed. The vggnet

was chosen over Alexnet and other architectures for

its simplicity, uniform 3x3 convolutions and depth,

which gives the power to exploit more general fea-

tures. Vggnet was chosen on Resnet, with about 3.5%

on the ILSVRC once again in the interest of simplic-

ity and computational ﬂexibility. The vggnet network

was formed (by the original authors) on Imagenet’s

well-known ILSVRC-2012 dataset (J. Deng, 2010),

consisting of 1.3 million images, distributed in 1000

classes, which makes it a good features extractor.

According to the study of art, DeepUnet and U-net

model is the most suitable for the processing of mul-

tispectral satellite images so we chose to work with

U-net for the multispectral images segmentation and

classiﬁcation.

We remember that our aim in this study is to char-

acterize image objects given their labels. We con-

sider this task as object extraction from satellite image

datasets.

Furthermore, we work with four databases com-

posed of heterogeneous satellite images like RGB im-

ages, multispctral images, multi-band images with

different sizes and resolutions in order to apply two

dimensions of big data such as Volume and Variety.

3 PROPOSED APPROACH

BRSD (big remote sensing data)represents a chal-

lenge for the DL. In fact, BD involves large number

of samples as inputs, large varieties of classes as out-

puts and high large dimensionality as attributes This

will lead to high running time and model complexi-

ties. For all these reasons algorithms with distributed

framework and parallelized machines are required.

From Hubel and Wiesel’s point of view exposed in

their early work on the cat’s visual cortex (D.Hubel,

1968),we approve that the visual cortex contains a

complex arrangement of cells. These cells are sen-

sitive to small local regions of the visual ﬁeld, named

a receptive ﬁeld. The local regions are tiled to hedge

the whole visual ﬁeld area. These cells act as local

ﬁlters and are suitable to exploit the strong spatially

local correlation present in images.

Based on this assessment, we proposed our ap-

proach in three steps as following in Figure 1. First we

propose to use Resilient Distributed Datasets (RDD)

structure of Spark wherein each image is considered

as a dataset element. Hence each image is represented

by a vector of RDDs records reﬂecting a local region

of the image source. When the RDDs vectors are es-

tablished, they are forwarded to the deep neural net-

work inputs wherein the number of networks is equal

to each vector dimension. At this second step, we

are looking for local-object identiﬁcation and image

segmentation by using a pre-trained Vggnet network.

At the third step we pipeline the Vggnet outputs to a

random forest and KNN machine learning with ML-

Deep Learning Analysis for Big Remote Sensing Image Classiﬁcation

357

Figure 1: The Proposed Approach.

lib package to aggregate and to reach the ﬁnal image

class.

3.1 Loading Images into Spark Data

Frame (c.f Figure1)

The 1st step consists in loading millions of RS im-

ages into Spark Resilient Distributed Dataset (RDD).

Then, the data is decoded in a distributed Platform

dedicated to large scale manipulation using Tensor-

Flow as DL library on each distributed worker in or-

der to test the hyper-parameters RDD of the model

and also to speed up the time intensive task with

Spark.

Spark addresses data distribution by integrating

the RDDs concept. Therefore each partition remains

in memory on its server .i.e. RS data is incorporated

in stripes into the RDDs. HDFS key/value are gen-

erated to each image partition to overcome the data

heterogeneity thus HDFS yields BRSD storage with

high read throughput.

3.2 Feature Extraction and Image

Segmentation (c.f Figure1)

The second step involves two stages: Object identiﬁ-

cation and image segmentation. At this stage, we used

DL.

After studying multiple works, we choose to work

with vggnet as the most efﬁcient model for RGB re-

mote sensing data and U-net as the most efﬁcient

model for multispectral data.

3.2.1 Object Identiﬁcation

For the object identiﬁcation we used pre-trained

vggnet-16. In order to get a model which is adapted

for our need we ﬁne-tuned the model. Fine-tuning is

one of the most important methods for creating a large

scale model. Indeed, this technique uses an already

formed network and allows to redeﬁne any compo-

nents(weights, layers,outputs) according to the new

data set.

For our approach, we have used a pre-trained

vggnet-16. The network is composed of 22 lay-

ers, 16 layers with learnable weights (13 convolu-

tional layers, and 3 fully connected layers). We have

adapted the conﬁguration of the input to consider any

RGB satellite image with any resolution (instead of

224x224).

In order to get a model adapted to our need we

ﬁne-tuned the model. Fine-tuning is one of the most

important technique devoted to a large scale model.

Indeed, this technique uses a pre-trained network and

KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval

358

allows to scale any component (weights, layers, out-

puts) according to the new data set. Finally, to have

an output adapted to our problem we have re-trained

only 3 output dense layers. Since we use vggnet-16 as

a feature extractor, the layers we are going to freeze

are the ﬁrst 22 layers.

After performing the model ﬁne-tuning phase, we

retrieve the characteristics from the last ”block5pool”

layer. We then use these characteristics and send

them to dense layers formed according to our dataset.

When the output layer of the pre-trained vggnet-16 is

a SOFTMAX activation with 1000 classes, we have

adapted this layer to 10 classes of SOFTMAX layer

(Buildings, cars, crops, Fast H2O, roads, Slow H2O,

Structures, Tracks, Trees and Trucks). Therefore we

have simply trained the weights of these layers and

have tried to identify the objects.

A U-net architecture is like a convolutional au-

toencoder, but it also has skip-like connections with

the feature maps located before the bottleneck (com-

pressed embedding) layer, in such a way that in the

decoder part some information comes from previous

layers, by passing the compressive bottleneck. The

output represents a segmentation map that can be used

for the segmentation part(next step).

U-Net architecture is based on encoders and de-

coders. The encoder gradually reduces the spatial

dimension with pooling layers and decoder gradu-

ally recovers the object details and spatial dimension.

There are usually shortcut connections from encoder

to decoder to help it to recover the object details bet-

ter.

3.2.2 Image Segmentation

Image segmentation is a subject of automatic learning

in which we have not only to classify what we have

seen in an image, but also to do it at the pixel level.

The aim of the classiﬁcation task is to reach carto-

graphic representation from a satellite image wherein

the elements are automatically grouped by their val-

ues.

This step of the process is applied to multispec-

tral images. Over the years, many techniques have

allowed the segmentation of images using convolu-

tional neural networks (CNN).

A general semantic segmentation architecture can

be broadly thought of as an encoder network fol-

lowed by a decoder network: The encoder is usually

a pre-trained classiﬁcation network like pre-trained

Vggnet-16 followed by a decoder network, and the

task of the decoder is to semantically project the dis-

criminative features (lower resolution) learnt by the

encoder into the pixel space (higher resolution) to get

a dense classiﬁcation.

U-net is an encoder-decoder architecture, we have

used it for the segmentation step. This U-net pre-

trained architecture is adapted to handle with multi-

spectral images.

A U-net with batch normalization is developed in

Tensorﬂow and serves as a segmentation model. The

model was formed for 9,000 lots, each batch contain-

ing 60 image patches. Each image area corresponds

to a crop of 144*144 from the original images.

Similar to the original U-net architecture, the loss

is calculated only on the 80 * 80 center region, be-

cause the edge pixels receive only partial information.

The model has been applied to satellite images

with 20 bands rather than 3 bands, a part from the

information provided by the RGB bands the other 20

bands contain much more information about the neu-

ral network to reﬂect, thus facilitating learning and the

quality of predictions: he is aware of the visual char-

acteristics that man does not possess. The additional

bands of available light are called P, M and A bands.

3.3 Image Classiﬁcation (c.f Figure1)

After identifying objects of the given images, we must

now perform the classiﬁcation step. To do it, we used

two different algorithms from Spark’s machine learn-

ing library(MLlib), namely Random Forest and KNN

(K-Nearest Neighbor).

the outputs of vggnet and U-net are pipelined to

the input of this step.

4 EXPERIMENTAL RESULTS

In this section, we will present some experimental se-

tups and results.

4.1 Experimental Setup

4.1.1 Hardware and Software Description

Our algorithm is preforming on 2 machines with

Ubunto 16.07 operating system installed NVIDIA

GEFORCE GTX 950M graphic device 8GByte

graphic memory and AWS server machine p2.xlarge

provide up to 16 NVIDIA K80 GPUs, 64 vCPUs

and 732 GiB of host memory, with a combined 192

GB of GPU memory, 40 thousand parallel processing

cores, 70 teraﬂops of single precision ﬂoating point

performance, and over 23 teraﬂops of double preci-

sion ﬂoating point performance.

The algorithm is implemented using Python2.7,

Tensorﬂow 1.3 and Apache Spark2.3.0 with Hadoop

2.7.

Deep Learning Analysis for Big Remote Sensing Image Classiﬁcation

359

4.1.2 Data Description

We have used 4 datasets to improve our approach:

SIRI-WHU , AID , multispectral dataset, and we pre-

pared a dataset which is a composed of satellite im-

ages captured with different sensors like SPOT4 ,

SPOT5, LANDSAT8 and so on.

We reorganized the SIRI-WHU and AID datasets

in order to obtain coherent datasets with 10 classes

(Buildings, cars, crops, Fast H2O, roads, Slow H2O,

Structures, Tracks, Trees and Trucks).

• SIRI-WHU Data Set. This is a 12-class Google

image dataset of SIRI-WHU meant for research

purposes. There are 200 images, each image mea-

sures 200*200 pixels, with a 2-m spatial resolu-

tion (Siri-Whu, ).

• AID Dataset. AID is a new large-scale aerial

image dataset, by collecting sample images from

Google Earth imagery. The new dataset is made

up of the following 30 aerial scene types. All im-

ages are labeled by the specialists in the ﬁeld of

remote sensing image interpretation. In all, the

AID dataset has a number of 10000 images within

30 classes (Inria, ).

• Our Dataset: our dataset has a number of 3,974

images within 10 classes. The images have var-

ious resolution 600*600, 989*989, 753*753 and

1024*1024.

• Multispectral: this dataset contains 1km x 1km

satellite images in both 3-band and 16-band for-

mats. The 3-band images are the traditional RGB

natural color images. The 16-band images contain

spectral information by capturing wider wave-

length channels. This multi-band imagery is taken

from the multispectral (400-1040nm) and short-

wave infrared (SWIR) (1195-2365nm) range.

4.1.3 Evaluation Metrics

The metrics that we used in order to evaluate the

performance of our work are precision, recall, and

F1-score.

Precision =

T P

T P+FP

Recall =

T P

T P+FN

F1 − score = 2 ∗

Accuracy×Recall

Accuracy+Recall

Where TP (True Positives) denotes the number of

correctly detected objects, FN (False Negatives) the

number of non-detected objects and FP (False posi-

tives) the number of incorrectly detected objects.

4.2 Results

Our datasets are splitted into two parts 70% for the

training and 30 % for the test.

4.2.1 VGGnet

The overall results of our 3 datasets is mentioned in

the table 1below:

4.2.2 U-net

The second type of workers are destinated for the

multispectral images. The training phase the accuracy

was 0.93% The Test phase the results are mentioned

in the table 2 below:

4.2.3 Discussion

In this work, a DL implementation for identifying and

classifying objects in satellite images is presented.

The implementation has been based on two frame-

works SPARK for storing, distributing and paral-

lelizing Big Remote Data and Tensorﬂow for imple-

menting and executing DL algorithms based on two

CNN models (VGGnet and U-net).We used four im-

age databases from different sources to apply the 2

DL algorithms to them based on the type, number and

resolution of the bands. We have applied Vggnet on

the ﬁrst three bases and encouraging results have been

obtained in particular with the Siri and our proprietary

dataset. Indeed, the images of these two databases are

in RGB or in 3 bands. While the AID dataset has a

better resolution than the other two datasets and has

exactly the same classes, it was very difﬁcult to char-

acterize in terms of object identiﬁcation. At the same

time, U-net has been applied to the the fourth dataset

and is more efﬁcient in identifying objects from high

resolution image dataset, as is the case with the mul-

tispectral dataset.

These results suggest that some DL algorithms are

sensitive to image resolution and others to image size.

The joint use of two models within our system makes

sense given the heterogeneity of the input source im-

ages.

5 CONCLUSIONS

Big data and DL are both considered as a big deal for

researchers. The concept of DL is to burrow into a

massive volume in order to identify patterns and ex-

tract features from complex unsupervised data with-

out human intervention, which make it an important

KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval

360

Table 1: Results obtained on 3 data sets with VGGNet model, the ﬁrst table represents the results of the whole dataset and the

second table represents the results for each class.

Datasets accuracy recall FScore

SIRI 0.73 0.68 0.7

AID 0.61 0.50 0.55

Our DataBase 0.84 0.81 0.82

labels accuracy recall FScore

Siri AID Our database Siri AID Our database Siri AID Our database

Buildings 0.24 0.56 0.33 0.52 0.79 0.63 0.90 0.69 0.78

Cars 0.5 0.77 0.61 0.53 0.53 0.53 0.53 0.73 0.62

Crops 0.8 0.5 0.62 0.81 0.87 0.84 0.89 0.94 0.91

FastH2O 0.89 0.4 0.55 0.77 0.50 0.61 0.89 0.67 0.76

Roads 0.69 0.73 0.71 0.83 0.77 0.80 1 0.78 0.93

SlowH2O 0.47 0.56 0.51 0.86 0.80 0.83 1 0.79 0.88

Structure 0.6 0.3 0.57 0.75 0.75 0.75 0.64 0.88 0.78

Tracks 0.37 0.5 0.42 0.62 0.67 0.64 0.76 0.57 0.67

Trees 0.16 0.06 0.12 0.80 0.50 0.62 0.80 0.57 0.67

Trucks 0.69 0.71 0.73 0.58 0.85 0.69 0.88 0.79 0.83

Table 2: Results obtained with Unet on the multispectral

dataset results, the ﬁrst table represents the results of the

whole dataset and the second table represents the results for

each class.

Datasets accuracy recall FScore

Multispectral Dataset 0.92 0.57 0.70

AID 0.71 0.68 0.69

SIRI 0.8 0.79 0.79

labels accuracy

Multispectral Dataset

Buildings 0.86

Cars 0.61

Crops 0.81

FastH2O 0.38

Roads 0.16

SlowH2O 0.46

Structure 0.94

Tracks 0.96

Trees 0.98

Trucks 0.93

tool for Big Data. This paper presents an technolog-

ical architecture for better classify sensing images.

This work contains a study of some machine learn-

ing tools to perform image classiﬁcation. Speciﬁ-

cally: the Spark Machine Learning implementations

with two pre-trained CNN (transfer learning from Vg-

gnet and U-net) were taken into account to categorize

samples from 4 datasets.A benchmark dataset of re-

mote sensing images was created for evaluation. For

the future work, we are going to use other multispec-

tral and hyperspectral Dataset for U-net results val-

idation and we are going to develop other DL archi-

tectures in order to adopt our approach to different RS

data types.

REFERENCES

A.Gupta, H.Thakur, R. P. S. (2017). A big data analysis

framework using apache spark and deep learning. In

ICDM Workshops: 9-16.

D.Hubel, T. (1968). Receptive ﬁelds and functional archi-

tecture of monkey striate cortex. In Journal of Physi-

ology (London), 195, 215243.

G.Cheng, C.Yang, X. L. J. (2018). When deep learning

meets metric learning : remote sensing image scene

classiﬁcation via learning discriminative cnns. In

IEEE transactions on geoscience and remote sensing,

56(5) :28112821.

I. Chebbi, W. Boulila, I. F. (2015). Big data: Concepts,

challenges and applications. In ICCCI (2) 2015: 638-

647. SPRINGER.

I.Chebbi, W.Boulila, N. M. I. (2018). A comparison of big

remote sensing data processing with hadoop mapre-

duce and spark. In ATSIP : 1-4.

Inria, D.

J. Deng, R. Socher, L. F.-F. W. D. K. L. L. L. (2010). Spec-

tralspatial classiﬁcation of hyperspectral image based

on deep auto-encoder. In IEEE Conference on Com-

puter Vision and Pattern Recognition(CVPR).

J. Geng, J. Fan, H. W.-X. M. B. L. and Chen, F. (2015).

High-resolution sar imageclassiﬁcation via deep con-

volutional autoencoder. In IEEE Geoscience and Re-

mote Sensing Letters, vol. 12: 23512355.

M.Chi, A.Plaza, J. Z. J. Y. (2016). Big data for remote sens-

ing: Challenges and opportunities. In Pro-ceedings of

the IEEE 104(11): 2207-2219. IEEE.

Deep Learning Analysis for Big Remote Sensing Image Classiﬁcation

361

N.Kussul, M.Lavreniuk, S. A. (2017). Deep learning clas-

siﬁcation of land cover and crop types using remote

sensing data. In IEEE Geosci. Remote Sensing Lett.

14(5): 778-782. IEEE.

P.Goldsborough (2016). A Tour of TensorFlow. CoRR

abs/1610.01178.

Siri-Whu, D.

T.Nguyen, J.Han, D. (2013). Satellite image classiﬁcation

using convolutional learning. In AIP Conference Pro-

ceedings,(1558):22372240.

X.Chen, X. (2014). Big data deep learning: Challenges and

perspectives. In EEE Access 2: 514-525. IEEE.

X.Ma, H.Wang, J. (2017). Spectral-spatial classiﬁcation of

hyperspectral image based on deep auto-encoder. In

IEEE Journal of Selected Topics in Applied Earth Ob-

servations and Remote Sensing.

X.Meng, J.K. Bradley, B. E. S.-S. (2016). Mllib: Ma-

chine learning in apache spark. In Journal of Machine

Learning Research 17: 34:1-34:7.

Y.Duan, F.Liu, L. P. L. (2017). Sar image segmentation

based on convolutional-wavelet neural network and

markov random ﬁeld. In Pattern Recognition, 64

:255267.

KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval

362