Two Stage SVM Classiﬁcation for Hyperspectral Data

Michal Cholewa and Przemyslaw Glomb

Institute of Theoretical and Applied Informatics, Baltycka 5, Gliwice, Poland

Keywords:

Hypercepctral Imaging, SVM, Two-stage, Classiﬁcation.

Abstract:

In this article, we present a method of enhancing the SVM classiﬁcation of hyperspectral data with the use

of three supporting classiﬁers. It is done by applying the fully trained classiﬁers on learning set to obtain the

pattern of their behavior which then can be used for reﬁnement of classiﬁer construction. The second stage

either is a straightforward translation of ﬁrst stage, if the ﬁrst stage classiﬁers agree on the result, or it consists

of using retrained SVM classiﬁer with only the data from learning data selected using ﬁrst stage. The scheme

shares some features with committee of experts fusion scheme, yet it clearly distinguishes lead classiﬁer using

the supporting ones only to reﬁne its construction. We present the construction of two-stage scheme, then test

it against the known Indian Pines HSI dataset and test it against straightforward use of SVM classiﬁer, over

which our method achieves noticeable improvement.

1 INTRODUCTION

In this article we present the two-stage classiﬁcation

for hyperspectral images, based on SVM classiﬁer

and reﬁnement of learning dataset.

The presented problem of analysis of HSI data is

becoming ever more present in research and practi-

cal application with the increased availability of hy-

perspectral imaging devices. Hyperspectral cameras

present the extension of input compared to RGB cam-

eras – they not only allow to analyze the shape and

color of the objects but also the structure of reﬂected

light, which, in turn, helps in determining the material

the object consist of.

The additional information presented in hyper-

spectral imaging results in many applications of such

data. Bhaskaran et al. in (Bhaskaran et al., 2004)

describes the utilization of hyperspectral imaging in

post-disaster management, Pu et al. in (Pu et al.,

2015) discusses its uses in quality control in food

products and Ellis, in (Ellis, 2003) focuses of hyper-

spectral analysis of oil-inﬂuenced soil.

Therefore it is important for the classiﬁcation

schemes to be able to handle such data, which lead the

hyperspectral classiﬁcation to be widely researched

topic. And so, Xu and Li in (Xu and Li, 2014) use

sparse probabilistic representation enhanced spatially

by Markov Random Fields, Melgani and Bruzzone in

(Melgani and Bruzzone, 2004) achieve very good re-

sults using Support Vector Machines. Chen et al. in

(Chen et al., 2013) achieve class separability by pro-

jecting the samples into a high-dimensional feature

space and kernelizing the sparse representation vec-

tors of training set. Bioucas-Dias et al (in (Bioucas-

Dias et al., 2012)) discuss hyperspectral unmixing

methods, based on assumption that each pixel in the

image in fact consists of several materials and dis-

tinguishing them leads to better classiﬁcation, while,

in (Fang et al., 2015) the neighborhood relations are

strongly utilized by analysis of superpixels.

What we intend to do is to enhance the results of

known SVM classiﬁer which in fact achieves very

good results on hyperspectral data by constructing

two-stage scheme that use several specialized second-

stage SVM classiﬁers for each initial classiﬁcation.

The idea of fusing more than one classiﬁer data

is well established branch of research and includes

many methods of combining the individual classiﬁca-

tion results. A broad survey of such methods can be

found in (Ruta and Gabrys, 2000) by Ruta and Gabrys

or in experimental comparison by Kuncheva et al. in

(Kuncheva et al., 2001) followed by theoretical study

by Kuncheva in (Kuncheva, 2002).

In this paper we propose two-stage scheme for re-

ﬁnement of SVM ﬁrst stage classiﬁcation. It relates

to the mixture of Committee of Experts (introduced

in (Perrone and Cooper, 1993)) and System General-

ization method, proposed ﬁrst by Wolpert in (Wolpert,

1992), yet it is not a fusion method per se, as it is in

fact only the repetitive usage of the same classiﬁer,

with modiﬁed learning set.

In the ﬁrst stage we use the SVM classiﬁer as well

Cholewa, M. and Glomb, P.

Two Stage SVM Classiﬁcation for Hyperspectral Data.

DOI: 10.5220/0005828103870391

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 387-391

ISBN: 978-989-758-173-1

387

as several classiﬁers added as the Committee of Ex-

perts. We then apply the constructed committee onto

the learning set to achieve the knowledge on structure

of classes for each votes combination. In the second

stage, we classify the hyperspectral vector with each

Committee member classiﬁer and obtain the classiﬁ-

cation according to the similarity of votes structure to

learning set. If we cannot draw conclusion we use the

base SVM classiﬁer again, but we train it only with

learn vectors from classes present in the ﬁrst stage

votes.

In our work we will not be focusing on spatial en-

hancement of the classiﬁcation results. We decided to

not do that since while it improves the results in our

test dataset, where the regions are more or less homo-

goenous, it not always is the case with hyperspectral

data. The spatial enhancement of the result can be

added for further result reﬁnement.

This article is organized as follows: in Section 2

we discuss used classiﬁcation schemes, then present

our two-stage approach, as well as the dataset we will

use for testing. In Section 3 we present the conducted

experiment and discuss its results, then we conclude

our work in 4.

2 METHOD

In this section we will discuss the method of classi-

ﬁcation that we use. Firstly, we present the dataset

used for experiments and the format of data. Then

we proceed to short introduction to known classiﬁ-

cation algorithms we utilize. Finally we present the

construction of our two-stage classiﬁcation scheme.

2.1 Used Classiﬁers

In our experiment we will use the one main classi-

ﬁer (we selected well researched Support Vector Ma-

chine – SVM as deﬁned in (Chapelle et al., 1999),

as it gives very good results on the dataset, cf (Rojas

et al., 2010)) and three supporting classiﬁers for re-

ﬁnement purposes. Those three will be the Bayesian

Network (as in (Denoyer and Gallinari, 2004)), K-

Nearest Neighbors and Decision Tree, (Friedl and

Brodley, 1997) and standard KNN.

2.2 Classiﬁcation Scheme

Let dataset D = D

∪ D

where D

be the training data

and D

be testing data. Let also c = 1, . . . ,C be classes

of vectors in D.

Let then Cl

, i = 1, . . ., n be the partial classiﬁers,

with Cl

being the main classiﬁer (in our case it is

SVM). For each v ∈ D we denote classiﬁcation of vec-

tor v using classiﬁer C l

as Cl

(v) and true class of

vector v as c(v).

2.2.1 Building Voting Base V B

In this phase we build voting base V B to use in second

stage of the scheme. The idea is to associate each

voting vector with true class of the vector that resulted

in that voting vector

V B : C

−→ C

<|D

that is each combination of votes that existed in stage

one classiﬁcation on D

with true class of the data vec-

tor that resulted in such combination. Of course that

means that for each combination the result is a list

shorter or equal with number of data vectors in learn-

ing set |D

| and

∑

v∈C

|V B(v)| = |D

For that we obtain, for each v ∈ D

, vote vector

V (v) where V (v) = [Cl

(v), . . .,Cl

(v)], that is a vec-

tor of classiﬁcations of data vector v obtained by each

partial classiﬁer Cl

, i = 1, . . ., n.

We construct the base V B by assigning true classes

to each V (v), v ∈ D

. We of course know the true class

c(v), since v ∈ D

V B, for each V (v) holds the information what

were the actual classes of data vectors v ∈ D

when

vote sequence V (v) occurred.

In other words for voting vector X ∈ C

V B(X) =



c(v) : v ∈ D

∧ V (v) = X



2.2.2 First Stage

The ﬁrst stage of the scheme consists of obtaining

classiﬁcation of vectors v ∈ D

by each of Cl

, i =

1, . . ., n and obtaining V (v), v ∈ D

, the vote vectors.

The ﬁrst stage, for v ∈ D

produces voting vector

V (v). We can expect that the Cl

, the main classiﬁer,

achieves signiﬁcantly higher accuracy than that of the

, j = 2, . . . , n (cf (Melgani and Bruzzone, 2004)).

It is frequently not the case (cf Table 1, ﬁrst part), yet

the experiment design assumes that the main classiﬁer

will be the one with general best performance.

The result of the ﬁrst stage of classiﬁcation for v ∈

is therefore V (v) = [Cl

(v), . . .,Cl

(v)].

We also assume that class assigned to vector v ∈

by ﬁrst stage of algorithm is the classiﬁcation re-

sult of main classiﬁer,

(v) = Cl

(v).

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

388

2.2.3 Second Stage

The second stage classiﬁcation for vector v ∈ D

, hav-

ing obtained V (v) proceeds as follows:

i. If V (v) exists within voting base V B (V B(V (v))

is a sequence of non-zero length), and there exists

most frequent element of V B(V (v)), then we con-

sider classiﬁcation result Cl

(v) as that element.

(v) = argmax

V (v)

(c)

where

V (v)

∑

x∈V B(V (v))

x=c

In other words we assume Cl

(v) to be the most

frequent class associated with the vote vector

V (v) in the training set.

ii. If V (v) exists within voting base V B, and two

or more elements c

, c

, . . ., c

, k leqC have equal

number of instances in V B(V (v)). Then we con-

struct second stage classiﬁer Cl

training it with

{

x ∈ D

: c(x) ∈

{

, c

, . . ., c

}}

Then Cl

(v) = Cl

(v)

iii. If V (v) did not occur in VB, we construct second

stage classiﬁer Cl

training it with



x ∈ D

: c(x) ∈ V (v)



Then Cl

(v) = Cl

(v)

3 RESULTS

To test the presented scheme we compare the two

stage classiﬁcation as presented in 2.2 with the results

achieved by the single stage classiﬁcation using the

same base classiﬁer.

3.1 Dataset

As a dataset we use the Indian Pines hyperspectral

image, a well-researched set for testing hyperspectral

image analysis. The scene was gathered by AVIRIS

sensor over the Indian Pines test site in North-western

Indiana and consists of 145 × 145 pixels and 224

spectral reﬂectance bands in the wavelength range

0.42.5 · 10

−6

meters. It contains two-thirds agricul-

ture, and one-third forest or other natural perennial

vegetation. There are two major dual lane highways,

a rail line, as well as some low density housing, other

built structures, and smaller roads. The ground truth

available is designated into sixteen classes.

3.2 Experiment

The experiment will proceed as follows:

As base classiﬁer we use SVM with three dif-

ferent kernels – linear, polynomial and radial basis

functions (RBF). As supporting classiﬁers we use

Bayesian Networks, Decision Trees and K-Nearest

Neighbors. The parameters for the SVM classiﬁers

are determined by grid search as we want to get the

peak performance of each kernel.

That setting gives us three classiﬁer sets (denoted

after respective kernels in SVM classiﬁers). For each

of these three sets we perform three classiﬁcations

1. Single-stage classiﬁcation using SVM classiﬁer

2. Two-stage classiﬁcation scheme, denoted as sec-

ond stage A.

3. Two-stage classiﬁcation without referring to vot-

ing base V B. In other words the two-stage

scheme with the assumption that for any v ∈ D

V B(V (v)) is always an empty sequence, thus ap-

plying only second stage, point ii.. We will denote

this approach as second stage B.

For validation of the results we use 10-folds cross-

validation with 10% of data used as learning dataset

and remaining 90 % used as test data.

This experiment is aimed at deciding if the second

stage offers actual results improvement. It also evalu-

ates how much the two-stage scheme differs between

cases with and without the using the V B (point ii of

second stage).

The results of this part of the experiment will al-

low us to draw conclusions as to how representative

the learning set is to the whole data set – it will deter-

mine in how many cases the training and test vectors

are so similar, that they produce the same voting vec-

tor.

The main result of the experiment is presented in

table 1.

We are also able to analyze the peak possible ac-

curacy of presented scheme. We do that by counting

number of cases where c(v) ∈ V (v) - which means at

least one of the {Cl

}

i=1

classiﬁed vector v correctly.

If this does not happen, second stage cannot im-

prove the result, as the correct class training vectors

are not even considered in construction second stage

classiﬁer.

The results of this analysis is presented in Table 2.

3.3 Discussion

The results presented in table 1 show, that adding the

second stage noticeably increases accuracy of SVM

Two Stage SVM Classiﬁcation for Hyperspectral Data

389

Figure 1: The Indian pines picture (a) and ground truth for the classiﬁcation (b) From (Manian and Jimenez, 2007).

classiﬁer, regardless the kernel. Moreover, the RBF

kernel yielding best results as standalone classiﬁer (in

our experiment as well as in experiments presented in

(Melgani and Bruzzone, 2004)) is the one most im-

proved.

By comparing the classiﬁcation scheme A, with

VB analysis and B, only with secondary training with

limited class set, we can see that the Second Stage A

achieves advantage over Second Stage B, though only

minimal. That suggest that the small training sample

very well reﬂects the whole body of the data – when

the voting sequence is the same, it usually means that

the vector class is also the same.

There is also at least 1 correct vote in over 90 per-

cent of cases (as seen in table 2), which means that

only in 10 % of cases second stage has no chances of

improving the results, since in those cases the correct

c(v) is not included in second-stage learning set. That

suggests that a scheme for choosing the right classiﬁer

from the ﬁrst stage pool could be a promising concept.

What seems to be the main downside of presented

scheme is that it is time consuming – most of the

second stage classiﬁcations require new SVM clas-

siﬁer trained with pool of vectors limited by the ﬁrst

stage analysis. The need of possible retraining (while

it certainly lowers with greater number of train vec-

tors) would also require to have the learning dataset

available, which may be space-consuming should the

dataset be really large.

4 CONCLUSION

The addition of the second stage to classiﬁcation

scheme, as well as several classiﬁers (with signiﬁ-

cantly lower accuracy rate) for reﬁnement purposes

noticeably increases the results of SVM classiﬁer with

all three analyzed kernels.

Table 1: The accuracy achieved in the experiment - in the

ﬁrst line we see one stage classiﬁcation using SVM with

three different kernels, in next three lines - the results of

supporting classiﬁers and in the last line the accuracy by

two stages classiﬁcation scheme.

Linear Polynomial RBF

Base SVM 0.6824 0.7701 0.7740

KNN 0.6776 0.6762 0.6845

DTree 0.6109 0.6100 0.6070

BN 0.5093 0.5041 0.4874

Sec. stage A 0.7142 0.7916 0.8136

Sec. stage B 0.7021 0.7887 0.8080

Table 2: The percentage of cases in which correct classiﬁ-

cation was present in V (v).

Linear Polynomial RBF

≥ one vote 0.8136 0.84 0.9210

≥ two votes 0.8080 0.7916 0.8312

While in some cases the second stage results do

not take into consideration the correct classiﬁcation

result (as it did not appear in any vote of the ﬁrst

stage), these situations constitute of less than 10 per-

cent of cases. What can also be observed is that small

learning set well reﬂects the whole body of hyper-

spectral vectors.

What is also important, that presented two stage

scheme is not spatially enhanced (in the way as pre-

sented in (Xu and Li, 2014)). Thus, achieved results

are not to be compared to the spatially enhanced clas-

siﬁers, but only to their ﬁrst stage. Further reﬁnement

of the initial classiﬁcation in this manner is of course

possible.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

390

ACKNOWLEDGEMENT

The work on this article has been supported by

the project ‘Representation of dynamic 3D scenes

using the Atomic Shapes Network model’ ﬁ-

nanced by National Science Centre, decision DEC-

2011/03/D/ST6/03753.

REFERENCES

Bhaskaran, S., Datt, B., Forster, T., T., N., and Brown, M.

(2004). Integrating imaging spectroscopy (445-2543

nm) and geographic information systems for post-

disaster management: A case of hailstorm damage

in sydney. International Journal of Remote Sensing,

25(13):2625–2639.

Bioucas-Dias, J. M., Plaza, A., Dobigeon, N., Parente, M.,

Du, Q., Gader, P., and Chanussot, J. (2012). Hy-

perspectral unmixing overview: Geometrical, statis-

tical, and sparse regression-based approaches. Se-

lected Topics in Applied Earth Observations and Re-

mote Sensing, IEEE Journal of, 5(2):354–379.

Chapelle, O., Haffner, P., and Vapnik, V. (1999). Sup-

port vector machines for histogram-based image clas-

siﬁcation. Neural Networks, IEEE Transactions on,

10(5):1055–1064.

Chen, Y., Nasrabadi, N. M., and Tran, T. D. (2013). Hy-

perspectral image classiﬁcation via kernel sparse rep-

resentation. Geoscience and Remote Sensing, IEEE

Transactions on, 51(1):217–231.

Denoyer, L. and Gallinari, P. (2004). Bayesian network

model for semi-structured document classiﬁcation. In-

formation Processing & Management, 40(5):807 –

827.

Ellis, J. (2003). Hyperspectral imaging technologies key for

oil seep/oil-impacted soil detection and environmental

baselines. Environmental Science and Engineering.

Retrieved on February, 23:2004.

Fang, L., Li, S., Kang, X., and Benediktsson, J. A. (2015).

Spectral–spatial classiﬁcation of hyperspectral images

with a superpixel-based discriminative sparse model.

Geoscience and Remote Sensing, IEEE Transactions

on, 53(8):4186–4201.

Friedl, M. and Brodley, C. (1997). Decision tree classiﬁca-

tion of land cover from remotely sensed data. Remote

Sensing of Environment, 61(3):399 – 409.

Kuncheva, L. (2002). A theoretical study on six classiﬁer

fusion strategies. Pattern Analysis and Machine Intel-

ligence, IEEE Transactions on, 24(2):281–286.

Kuncheva, L., Bezdek, J., and Duin, R. (2001). Decision

templates for multiple classiﬁer fusion: an experimen-

tal comparison. Pattern Recognition, 34:299–314.

Manian, V. and Jimenez, L. O. (2007). Land cover and ben-

thic habitat classiﬁcation using texture features from

hyperspectral and multispectral images. Journal of

Electronic Imaging, 16(2):023011–023011–12.

Melgani, F. and Bruzzone, L. (2004). Classiﬁcation of hy-

perspectral remote sensing images with support vec-

tor machines. Geoscience and Remote Sensing, IEEE

Transactions on, 42(8):1778–1790.

Perrone, M. P. and Cooper, L. (1993). When networks dis-

agree: Ensemble methods for hybrid neural networks.

pages 126–142. Chapman and Hall.

Pu, Y.-Y., Feng, Y.-Z., and Sun, D.-W. (2015). Re-

cent progress of hyperspectral imaging on quality and

safety inspection of fruits and vegetables: A review.

Comprehensive Reviews in Food Science and Food

Safety, 14(2):176–188.

Rojas, M., D

opido, I., Plaza, A., and Gamba, P. (2010).

Comparison of support vector machine-based pro-

cessing chains for hyperspectral image classiﬁcation.

In SPIE Optical Engineering+ Applications, pages

78100B–78100B. International Society for Optics and

Photonics.

Ruta, D. and Gabrys, B. (2000). An overview of classiﬁer

fusion methods.

Wolpert, D. (1992). Stacked generalization. Neural Net-

works, 5:241–259.

Xu, L. and Li, J. (2014). Bayesian classiﬁcation of hyper-

spectral imagery based on probabilistic sparse repre-

sentation and markov random ﬁeld. Geoscience and

Remote Sensing Letters, IEEE, 11(4):823–827.

Two Stage SVM Classiﬁcation for Hyperspectral Data

391