Sparse Least Squares Twin Support Vector Machines with

Manifold-preserving Graph Reduction

Xijiong Xie

The School of Information Science and Engineering,

Ningbo University, Zhejiang 315211, China

Keywords:

Non-parallel Hyperplane Classiﬁer, Least Squares Twin Support Vector Machines, Manifold-preserving

Graph Reduction.

Abstract:

Least squares twin support vector machines are a new non-parallel hyperplane classiﬁer, in which the primal

optimization problems of twin support vector machines are modiﬁed in least square sense and inequality

constraints are replaced by equality constraints. In classiﬁcation problems, enhancing the robustness of least

squares twin support vector machines and reducing the time complexity of kernel function evaluation of a

new example when inferring the label of a new example are very important. In this paper, we propose a

new sparse least squares twin support vector machines based on manifold-preserving graph reduction which

is an efﬁcient graph reduction algorithm with manifold assumption. This method ﬁrst selects informative

examples for positive examples and negative examples, respectively and then applies them for classiﬁcation.

Experimental results conﬁrm the feasibility and effectiveness of our proposed method.

1 INTRODUCTION

Support vector machines (SVMs) are a very efﬁ-

cient classiﬁcation algorithm (Shawe-Taylor and Sun,

2011; Vapnik, 1995; Christianini and Shawe-Taylor,

2002; Ripley, 2002), which are based on the princi-

pled idea of structural risk minimization in statistical

learning theory. Compared with other machine learn-

ing algorithms, SVMs can obtain a better generaliza-

tion. They are well-known for their robustness, good

generalization ability, and unique global optimum so-

lution in the case of convex problem. Recent years

witnessed emergence of many successful non-parallel

hyperplane classiﬁers. Twin support vector machines

(TSVM) (Jayadeva et al., 2007) are a representative

non-parallel hyperplane classiﬁer which aims to gen-

erate two non-parallel hyperplanes such that one of

the hyperplanes is closer to one class and as far as

possible from the other class. Twin bounded SVM

(TBSVM) (Shao et al., 2011) is an improved version

of TSVM whose optimization problems are changed

slightly by adding a regularization term with the idea

of maximizing the margin. TSVM has been extended

to these learning frameworks such as multi-task learn-

ing (Xie and Sun, 2015b), multi-view learning (Xie

and Sun, 2015a; Xie and Sun, 2014), semi-supervised

learning (Chen et al., 2016), multi-label learning (Qi

et al., 2012) and regression problem (Peng, 2010).

The two non-parallel hyperplanes of TSVM are ob-

tained by solving a pair of quadratic programming

problems (QPPs). Thus the time complexity is rel-

ative high. Least squares twin support vector ma-

chines (LSTSVM) (Kumar and Gopal, 2009) were

proposed to reduce the time complexity by chang-

ing the constraints to a series of equalities constraints

and leading to a pair of linear equations, and can

easily handle large datasets. Many improved vari-

ants of LSTSVM have been proposed, such as knowl-

edge based LSTSVM (Kumar et al., 2010), Laplacian

LSTSVM for semi-supervised classiﬁcation (Chen

et al., 2014), Weighted LSTSVM (Mu et al., 2014).

However, enhancing the robustness of LSTSVM and

reducing the time complexity of kernel function eval-

uation of a new example when inferring the label of a

new example are very important.

One of sparse methods uses only a subset of the

data and focuses on the strategies of selecting the

representative examples to form the subset. These

methods lead to a signiﬁcant reduction of the time

complexity. Although some methods such as ran-

dom sampling or k-means clustering can be used

to reduce the size of the graph, they have no guar-

antees of preserving the manifold structure or ef-

fectively removing outliers and noisy examples. In

Xie, X.

Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction.

DOI: 10.5220/0006690805630567

In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 563-567

ISBN: 978-989-758-276-9

563

particular, the k-means method is sensitive to out-

liers, and time-consuming when the number of clus-

ters is large. Manifold-preserving graph reduction

(Sun et al., 2014) is a graph reduction algorithm

which can effectively eliminate outliers and noisy ex-

amples. In this paper, a novel LSTSVM algorithm

based on manifold-preserving graph reduction is pro-

posed. The experimental results on four datasets vali-

dated the feasibility and effectiveness of the proposed

method.

The remainder of this paper proceeds as follows.

Section 2 reviews related work about LSTSVM and

MPGR. Section 3 thoroughly introduces our proposed

SLSTSVM. After reporting experimental results in

Section 4, we give conclusions and future work in

Section 5.

2 RELATED WORK

In this section, we brieﬂy review LSTSVM and

MPGR.

2.1 LSTSVM

Given a training dataset containing m examples be-

longing to classes 1 and −1 are represented by matri-

ces A

and B

−

, and the size of A

and B

−

are (m

×d)

and (m

×d), respectively. Deﬁne two matrices A, B

and four vectors v

, v

, e

, where e

and e

are

vectors of ones of appropriate dimensions and

A = (A

), B = (B

−

(

)

, v

(

)

(1)

The central idea of LSTSVM (Kumar and Gopal,

2009) is to seek two nonparallel hyperplanes

⊤

x+ b

= 0 and w

⊤

x+ b

= 0 (2)

around which the examples of the corresponding class

get clustered. The classiﬁer is given by solving the

following QPPs separately.

(LSTSVM1)

min

(Av

)

⊤

(Av

) +

⊤

s.t. −Bv

+ q

= e

(3)

(LSTSVM2)

min

(Bv

)

⊤

(Bv

) +

⊤

s.t. Av

+ q

= e

(4)

where c

and d

are nonnegative parameters and q

are slack vectors of appropriate dimensions. Each

of the above two QPPs can be converted to the explicit

expression of LSTSVM.

(LSTSVM1)

min

(Av

)

⊤

(Av

) +

+ Bv

+ q

)

⊤

+ Bv

+ q

(5)

(LSTSVM2)

min

(Bv

)

⊤

(Bv

) +

−(Av

+ q

))

⊤

−(Av

+ q

)).

(6)

The two nonparallel hyperplanes are obtained by

solving the following two systems of linear equations:

= −(A

⊤

A+

⊤

−1

⊤

= (B

⊤

B+

⊤

−1

⊤

(7)

The label of a new example x is determined by the

minimum of |x

⊤

+ b

| (r = 1,2) which are the per-

pendicular distances of x to the two hyperplanes given

in (2).

2.2 MPGR

In this section, we brieﬂy introduce the manifold-

preserving graph reduction algorithm (Sun et al.,

2014).

MPGR is an efﬁcient graph reduction algorithm

based on the manifold assumption. A sparse graph

with manifold-preserving properties means that a

point outside of it should have a high connectivity

with a point to be reserved. Suppose there is a graph

G composed of all unlabeled examples, the manifold-

preserving sparse graphs are those sparse graph can-

didates which have a high space connectivity with G.

The value of space connectivity is as follows:

m−s

∑

i=s+1

(

max

j=1,...,s

)

, (8)

where m is the number of all vertices, s is the number

of vertices to be retained, and W is the weight matrix.

For subset selection of all the unlabeled examples, a

point which is closer to surrounding points should be

selected since it contains more important information.

This conforms to MPGR in which the examples with

a large degree will be preferred. The degree d(p) is

deﬁned as

d(p) =

∑

p−q

, (9)

where p −q means that example p is connected with

example q and w

is their corresponding weight. If

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

564

two examples are not linked, their weight would be

zero. Due to its simplicity, d(p) is generally con-

sidered as a criterion to construct sparse graphs. A

bigger d(p) means the example p contains more in-

formation. Namely, the example p is more likely to

be selected into the sparse graphs. In a word, the

subset constructed by MPGR is high representative

and maintains a good global manifold structure of the

original data distribution. This can eliminate the out-

lier and noise examples and enhance the robustness of

the algorithm.

Algorithm 1 : Manifold-preserving Graph Reduction

Algorithm.

1: Input: Graph G(V,E,W) with m vertices;

2: s is the number of the vertices in the desired

sparse graph.

3: for z = 1,2,...,s

4: compute the degree d(i)(i = 1,2,...,m−z+

1).

5: select the vertice v with the maximum de-

gree.

6: remove v and associated edges from G; add

v to G

7: end for

8: Output: Manifold-preserving sparse graph G

with s vertices.

3 SLSTSVM

As mentioned earlier, LSTSVM generates two non-

parallel hyperplanes such that each hyperplane is

close to one class and as far as possible from the other.

Take the positive hyperplane for example, some out-

liers or noisy examples in the positive examples may

have negative effect on the obtaining of the optimal

positive hyperplane. However, MPGR can effectively

remove outliers or noisy examples. The train example

reduction algorithm also can speed up the LSTSVM

train and testing process.

The MPGR constructs a graph using the corre-

sponding positive examples. Initially, the candidate

set contains all positive examples, while the sought

sparse set is null. For each example in the candi-

date set, the MPGR calculates the degree of the cor-

responding vertex in the graph. It selects a vertex

with the maximum degree in the graph correspond-

ing to the positive examples. Then we include the

data point associated with the chosen vertex into the

sought sparse set and remove it from the candidate

set. This step considers the representativeness crite-

rion. Due to the property of high spatial connectivity,

the subset is highly representative and preserving the

global structure of the original distribution of train-

ing set. The sparse set selection of negative exam-

ples is similar to above processes. Overall, inspired

by the manifold-preserving principle, SLSTSVM not

only can enhance the robustness of algorithm but also

reduce the train and testing time.

Algorithm 2 : Sparse Least Squares Twin Support

Vector Machines.

1: Input: Positive examples A and negative exam-

ples B, model parameters (c

, d

2: Use MPGR on positive examples and negative

examples to obtain the sparse subsets T

and T

corresponding to the positive examples and neg-

ative examples according to the retained percent-

age r, respectively.

3: Fed the two sparse subsets T

and T

into the op-

timization of LSTSVM.

4: Determine parameters of two hyperplanes by

solving the linear equation (7).

5: Output: For a test example x = ( ¯x

⊤

, if

⊤

|≤ |x

⊤

|, it is classiﬁed to class +1, other-

wise class −1.

The computation time of LSTSVM with the ker-

nel method is about O(m

/4) which is the time of ma-

trix inversion operation while the computation time of

SLSTSVM is reduced to r

times of the computation

time of LSTSVM.

4 EXPERIMENTAL RESULTS

In this section, we evaluate our proposed SLSTSVM

on four real-world datasets. The four datasets come

from UCI Machine Learning Repository: ionosphere

classiﬁcation, handwritten digit classiﬁcation, pima

and sonar. Speciﬁc information about ionosphere and

handwritten digits is listed in Table 1.

Table 1: Datasets.

Name Attributes Instances Classes

Ionosphere 34 351 2

Handwritten digits 649 2000 10

4.1 Ionosphere

The ionosphere dataset was collected by a system in

Goose Bay, Labrador, that contains a phased array of

16 high-frequency antennas with a total transmitted

power on the order of 6.4 kilowatts. The targets were

free electrons in the ionosphere. “Good” radar returns

Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction

565

are those showing evidence of some type of structure

in the ionosphere. “Bad” returns are those that do not

and their signals pass through the ionosphere. It in-

cludes 351 examples in total which are divided into

225 “Good” (positive) examples and 126 “Bad” (neg-

ative) examples.

In our experiments, we capture 99% of the data

variance while reducing the dimensionality from 34

to 21 with PCA. We use ten-fold cross-validation to

select the best parameters for all involved methods in

the region [2

−10

] with exponential growth 1 and

get the average classiﬁcation accuracy rates by run-

ning the algorithms for ﬁve times. We use 300 exam-

ples for training and the others for testing. We set the

output number of MPGR as 10%, 20%, 30%, 40%,

100% of the 300 examples. Linear kernel is chosen

for the dataset. LSTSVM with random sampling is

used for comparison. From the experimental results

in Table 2, we can ﬁnd that our method SLSTSVM

performs better than LSTSVM. When the percentage

is 10%, the performance of SLSTSVM is already as

same as the one with the percentage 100%. When

the percentage is 30%, the performance is best. We

conclude SLSTSVM can improve its robustness com-

pared with LSTSVM.

Table 2: Classiﬁcation accuracies and standard deviations

(%) on Ionosphere.

Method

Per

LSTSVM SLSTSVM

10 76.08(10.78) 82.35(5.72)

20 80.78(5.61) 83.53(4.72)

30 80.39(6.04) 85.49(4.51)

40 78.43(10.28) 81.57(5.65)

100 82.35(7.96) 82.35(7.96)

Table 3: Classiﬁcation accuracies and standard deviations

(%) on Handwritten digits.

Digit pair

Method

LSTSVM SLSTSVM

(0,8) 95.20(3.09) 96.90(1.29)

(3,9) 97.90(1.47) 98.10(1.08)

(3,5) 96.30(1.92) 96.90(1.39)

(2,8) 96.60(1.47) 96.70(1.82)

4.2 Handwritten Digits

This dataset contains features of handwritten digits

(0 ∼ 9) extracted from a collection of Dutch utility

maps. It contains 2000 examples (200 examples per

class) with ﬁve views. We use the view 64 Karhunen-

Love coefﬁcients of each example image. Because

TSVMs are designed for binary classiﬁcation while

handwritten digits dataset contains 10 classes. We

choose four pairs (3,5), (2,8), (0,8) and (3,9) to eval-

uate all involved methods for the experiment. Linear

kernel is chosen for the dataset. We use 200 exam-

ples for training, and 200 examples for testing. We

use ten-fold cross-validation to select the best param-

eters for all involved methods in the region [2

−10

]

with exponential growth 1. We set the input number

of MPGR as 10%, 90%, 100% of the 200 examples.

From the experimental results in Table 5, we can con-

clude that the performance of SLSTSVM is superior

to the one of LSTSVM.

4.3 Pima and Sonar

Pima is dataset that can predict diabetes of Pima In-

dians according to the incidence of medical records

over 5 years . It consists of 768 examples and 8 at-

tributes. Sonar is a dataset that can predict whether

the object is a rock or a mine according to the strength

of a given sonar from different angles. It contains 208

examples and 60 attributes. From the experimental

results, we can conclude that SLSTSVM are superior

to LSTSVM. When the percentage is 10%, the per-

formance of SLSTSVM outperforms the one with the

percentage 100%. We conclude SLSTSVM can im-

prove its robustness.

Table 4: Classiﬁcation accuracies and standard deviations

(%) on Pima.

Per

Method

LSTSVM SLSTSVM

10 54.55(5.89) 59.85(7.18)

90 55.73(5.27) 57.31(6.86)

100 55.67(5.22) 55.67(5.22)

Table 5: Classiﬁcation accuracies and standard deviations

(%) on Sonar.

Per

Method

LSTSVM SLSTSVM

20 57.78(4.75) 60.56(10.26)

90 62.96(6.14) 63.89(4.72)

100 62.22(3.90) 62.22(3.90)

5 CONCLUSION AND FUTURE

WORK

In this paper, we have proposed a novel sparse least

squares support vector machines based on manifold-

preserving graph reduction. Experimental results on

multiple real-world datasets indicate that SLSTSVM

are superior to LSTSVM using random sampling. It

would be interesting for future work to exploit the

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

566

way which selects the informative and representa-

tive examples from unlabeled examples to multi-view

semi-supervised learning.

ACKNOWLEDGEMENTS

This work is supported by Ningbo University talent

project 421703670 as well as programs sponsored by

K.C. Wong Magna Fund in Ningbo University. It is

also supported by the Zhejiang Provincial Department

of Education under Projects 801700472.

REFERENCES

Chen, W., Shao, Y., and Deng, N. (2014). Laplacian

least squares twin support vector machine for semi-

supervised classiﬁcation. Neurocomputing, 145:465–

476.

Chen, W., Shao, Y., Li, C., and Deng, N. (2016). MLTSVM:

A novel twin support vector machine to multi-label

learning. Pattern Recognition, 52:61–74.

Christianini, N. and Shawe-Taylor, J. (2002). An introduc-

tion to support vector machines. Cambridge Univer-

sity Press, Cambridge.

Jayadeva, Khemchandani, S., and Chandra (2007). Twin

support vector machines for pattern classiﬁcation.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 74:905–910.

Kumar, M. and Gopal, M. (2009). Least squares twin sup-

port vector machines for pattern classiﬁcation. Expert

Systems with Applications, 36:7535–7543.

Kumar, M., Khemchandani, R., and Gopal, M. (2010).

Knowledge based least squares twin support vector

machines. Information Sciences, 180:4606–4618.

Mu, X., Li, J., and Chen, L. (2014). Classiﬁcation with

noise via weighted least squares twin support vector

machine. Computer Simulation, 31:288–292.

Peng, X. (2010). Tsvr: An efﬁcient twin support vector ma-

chine for regression. Neural Networks, 23:365–372.

Qi, Z., Tian, Y., and Shi, Y. (2012). Laplacian twin sup-

port vector machine for semi-supervised classiﬁca-

tion. Neural Networks, 35:46–53.

Ripley, B. (2002). Pattern recognition and neural networks.

Cambridge University Press, Cambridge.

Shao, Y., Zhang, C., Wang, X., and Deng, N. (2011). Im-

provements on twin support vector machines. IEEE

Transactions on Neural Networks and learning sys-

tems, 1:962–968.

Shawe-Taylor, J. and Sun, S. (2011). A review of optimiza-

tion methodologies in support vector machines. Neu-

rocomputing.

Sun, S., Hussain, Z., and Shawe-Taylor, J. (2014).

Manifold-preserving graph reduction for sparse semi-

supervised learning. Neurocomputing, 124:13–21.

Vapnik, V. (1995). The nature of statistical learning theory.

Springer-Verlag, New York.

Xie, X. and Sun, S. (2014). Multi-view laplacian twin sup-

port vector machines. Applied Intelligence, 41:1059–

1068.

Xie, X. and Sun, S. (2015a). Multi-view twin support vector

machines. Intelligent Data Analysis, 19:701–712.

Xie, X. and Sun, S. (2015b). Multitask centroid twin sup-

port vector machines. Neurocomputing, 149:1085–

1091.

Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction

567