Sparse Least Squares Twin Support Vector Machines with
Manifold-preserving Graph Reduction
Xijiong Xie
The School of Information Science and Engineering,
Ningbo University, Zhejiang 315211, China
Keywords:
Non-parallel Hyperplane Classifier, Least Squares Twin Support Vector Machines, Manifold-preserving
Graph Reduction.
Abstract:
Least squares twin support vector machines are a new non-parallel hyperplane classifier, in which the primal
optimization problems of twin support vector machines are modified in least square sense and inequality
constraints are replaced by equality constraints. In classification problems, enhancing the robustness of least
squares twin support vector machines and reducing the time complexity of kernel function evaluation of a
new example when inferring the label of a new example are very important. In this paper, we propose a
new sparse least squares twin support vector machines based on manifold-preserving graph reduction which
is an efficient graph reduction algorithm with manifold assumption. This method first selects informative
examples for positive examples and negative examples, respectively and then applies them for classification.
Experimental results confirm the feasibility and effectiveness of our proposed method.
1 INTRODUCTION
Support vector machines (SVMs) are a very effi-
cient classification algorithm (Shawe-Taylor and Sun,
2011; Vapnik, 1995; Christianini and Shawe-Taylor,
2002; Ripley, 2002), which are based on the princi-
pled idea of structural risk minimization in statistical
learning theory. Compared with other machine learn-
ing algorithms, SVMs can obtain a better generaliza-
tion. They are well-known for their robustness, good
generalization ability, and unique global optimum so-
lution in the case of convex problem. Recent years
witnessed emergence of many successful non-parallel
hyperplane classifiers. Twin support vector machines
(TSVM) (Jayadeva et al., 2007) are a representative
non-parallel hyperplane classifier which aims to gen-
erate two non-parallel hyperplanes such that one of
the hyperplanes is closer to one class and as far as
possible from the other class. Twin bounded SVM
(TBSVM) (Shao et al., 2011) is an improved version
of TSVM whose optimization problems are changed
slightly by adding a regularization term with the idea
of maximizing the margin. TSVM has been extended
to these learning frameworks such as multi-task learn-
ing (Xie and Sun, 2015b), multi-view learning (Xie
and Sun, 2015a; Xie and Sun, 2014), semi-supervised
learning (Chen et al., 2016), multi-label learning (Qi
et al., 2012) and regression problem (Peng, 2010).
The two non-parallel hyperplanes of TSVM are ob-
tained by solving a pair of quadratic programming
problems (QPPs). Thus the time complexity is rel-
ative high. Least squares twin support vector ma-
chines (LSTSVM) (Kumar and Gopal, 2009) were
proposed to reduce the time complexity by chang-
ing the constraints to a series of equalities constraints
and leading to a pair of linear equations, and can
easily handle large datasets. Many improved vari-
ants of LSTSVM have been proposed, such as knowl-
edge based LSTSVM (Kumar et al., 2010), Laplacian
LSTSVM for semi-supervised classification (Chen
et al., 2014), Weighted LSTSVM (Mu et al., 2014).
However, enhancing the robustness of LSTSVM and
reducing the time complexity of kernel function eval-
uation of a new example when inferring the label of a
new example are very important.
One of sparse methods uses only a subset of the
data and focuses on the strategies of selecting the
representative examples to form the subset. These
methods lead to a significant reduction of the time
complexity. Although some methods such as ran-
dom sampling or k-means clustering can be used
to reduce the size of the graph, they have no guar-
antees of preserving the manifold structure or ef-
fectively removing outliers and noisy examples. In
Xie, X.
Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction.
DOI: 10.5220/0006690805630567
In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 563-567
ISBN: 978-989-758-276-9
Copyright © 2018 by SCITEPRESS Science and Technology Publications, Lda. All rights reser ved
563
particular, the k-means method is sensitive to out-
liers, and time-consuming when the number of clus-
ters is large. Manifold-preserving graph reduction
(Sun et al., 2014) is a graph reduction algorithm
which can effectively eliminate outliers and noisy ex-
amples. In this paper, a novel LSTSVM algorithm
based on manifold-preserving graph reduction is pro-
posed. The experimental results on four datasets vali-
dated the feasibility and effectiveness of the proposed
method.
The remainder of this paper proceeds as follows.
Section 2 reviews related work about LSTSVM and
MPGR. Section 3 thoroughly introduces our proposed
SLSTSVM. After reporting experimental results in
Section 4, we give conclusions and future work in
Section 5.
2 RELATED WORK
In this section, we briefly review LSTSVM and
MPGR.
2.1 LSTSVM
Given a training dataset containing m examples be-
longing to classes 1 and 1 are represented by matri-
ces A
+
and B
, and the size of A
+
and B
are (m
1
×d)
and (m
2
×d), respectively. Define two matrices A, B
and four vectors v
1
, v
2
, e
1
, e
2
, where e
1
and e
2
are
vectors of ones of appropriate dimensions and
A = (A
+
,e
1
), B = (B
,e
2
),
v
1
=
(
w
1
b
1
)
, v
2
=
(
w
2
b
2
)
.
(1)
The central idea of LSTSVM (Kumar and Gopal,
2009) is to seek two nonparallel hyperplanes
w
1
x+ b
1
= 0 and w
2
x+ b
2
= 0 (2)
around which the examples of the corresponding class
get clustered. The classifier is given by solving the
following QPPs separately.
(LSTSVM1)
min
v
1
,q
1
1
2
(Av
1
)
(Av
1
) +
c
1
2
q
1
q
1
s.t. Bv
1
+ q
1
= e
2
,
(3)
(LSTSVM2)
min
v
2
,q
2
1
2
(Bv
2
)
(Bv
2
) +
d
1
2
q
2
q
2
s.t. Av
2
+ q
2
= e
1
,
(4)
where c
1
and d
1
are nonnegative parameters and q
1
,
q
2
are slack vectors of appropriate dimensions. Each
of the above two QPPs can be converted to the explicit
expression of LSTSVM.
(LSTSVM1)
min
v
1
,q
1
1
2
(Av
1
)
(Av
1
) +
1
2
c
1
(e
2
+ Bv
1
+ q
1
)
(e
2
+ Bv
1
+ q
1
),
(5)
(LSTSVM2)
min
v
2
,q
2
1
2
(Bv
2
)
(Bv
2
) +
1
2
d
1
(e
1
(Av
2
+ q
2
))
(e
1
(Av
2
+ q
2
)).
(6)
The two nonparallel hyperplanes are obtained by
solving the following two systems of linear equations:
v
1
= (A
A+
1
c
1
B
B)
1
A
e
2
,
v
2
= (B
B+
1
d
1
A
A)
1
B
e
1
.
(7)
The label of a new example x is determined by the
minimum of |x
w
r
+ b
r
| (r = 1,2) which are the per-
pendicular distances of x to the two hyperplanes given
in (2).
2.2 MPGR
In this section, we briefly introduce the manifold-
preserving graph reduction algorithm (Sun et al.,
2014).
MPGR is an efficient graph reduction algorithm
based on the manifold assumption. A sparse graph
with manifold-preserving properties means that a
point outside of it should have a high connectivity
with a point to be reserved. Suppose there is a graph
G composed of all unlabeled examples, the manifold-
preserving sparse graphs are those sparse graph can-
didates which have a high space connectivity with G.
The value of space connectivity is as follows:
1
ms
m
i=s+1
(
max
j=1,...,s
W
ij
)
, (8)
where m is the number of all vertices, s is the number
of vertices to be retained, and W is the weight matrix.
For subset selection of all the unlabeled examples, a
point which is closer to surrounding points should be
selected since it contains more important information.
This conforms to MPGR in which the examples with
a large degree will be preferred. The degree d(p) is
defined as
d(p) =
pq
w
pq
, (9)
where p q means that example p is connected with
example q and w
pq
is their corresponding weight. If
ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods
564
two examples are not linked, their weight would be
zero. Due to its simplicity, d(p) is generally con-
sidered as a criterion to construct sparse graphs. A
bigger d(p) means the example p contains more in-
formation. Namely, the example p is more likely to
be selected into the sparse graphs. In a word, the
subset constructed by MPGR is high representative
and maintains a good global manifold structure of the
original data distribution. This can eliminate the out-
lier and noise examples and enhance the robustness of
the algorithm.
Algorithm 1 : Manifold-preserving Graph Reduction
Algorithm.
1: Input: Graph G(V,E,W) with m vertices;
2: s is the number of the vertices in the desired
sparse graph.
3: for z = 1,2,...,s
4: compute the degree d(i)(i = 1,2,...,mz+
1).
5: select the vertice v with the maximum de-
gree.
6: remove v and associated edges from G; add
v to G
s
7: end for
8: Output: Manifold-preserving sparse graph G
s
with s vertices.
3 SLSTSVM
As mentioned earlier, LSTSVM generates two non-
parallel hyperplanes such that each hyperplane is
close to one class and as far as possible from the other.
Take the positive hyperplane for example, some out-
liers or noisy examples in the positive examples may
have negative effect on the obtaining of the optimal
positive hyperplane. However, MPGR can effectively
remove outliers or noisy examples. The train example
reduction algorithm also can speed up the LSTSVM
train and testing process.
The MPGR constructs a graph using the corre-
sponding positive examples. Initially, the candidate
set contains all positive examples, while the sought
sparse set is null. For each example in the candi-
date set, the MPGR calculates the degree of the cor-
responding vertex in the graph. It selects a vertex
with the maximum degree in the graph correspond-
ing to the positive examples. Then we include the
data point associated with the chosen vertex into the
sought sparse set and remove it from the candidate
set. This step considers the representativeness crite-
rion. Due to the property of high spatial connectivity,
the subset is highly representative and preserving the
global structure of the original distribution of train-
ing set. The sparse set selection of negative exam-
ples is similar to above processes. Overall, inspired
by the manifold-preserving principle, SLSTSVM not
only can enhance the robustness of algorithm but also
reduce the train and testing time.
Algorithm 2 : Sparse Least Squares Twin Support
Vector Machines.
1: Input: Positive examples A and negative exam-
ples B, model parameters (c
1
, d
1
).
2: Use MPGR on positive examples and negative
examples to obtain the sparse subsets T
1
and T
2
corresponding to the positive examples and neg-
ative examples according to the retained percent-
age r, respectively.
3: Fed the two sparse subsets T
1
and T
2
into the op-
timization of LSTSVM.
4: Determine parameters of two hyperplanes by
solving the linear equation (7).
5: Output: For a test example x = ( ¯x
1)
, if
|x
v
1
| |x
v
2
|, it is classified to class +1, other-
wise class 1.
The computation time of LSTSVM with the ker-
nel method is about O(m
3
/4) which is the time of ma-
trix inversion operation while the computation time of
SLSTSVM is reduced to r
3
times of the computation
time of LSTSVM.
4 EXPERIMENTAL RESULTS
In this section, we evaluate our proposed SLSTSVM
on four real-world datasets. The four datasets come
from UCI Machine Learning Repository: ionosphere
classification, handwritten digit classification, pima
and sonar. Specific information about ionosphere and
handwritten digits is listed in Table 1.
Table 1: Datasets.
Name Attributes Instances Classes
Ionosphere 34 351 2
Handwritten digits 649 2000 10
4.1 Ionosphere
The ionosphere dataset was collected by a system in
Goose Bay, Labrador, that contains a phased array of
16 high-frequency antennas with a total transmitted
power on the order of 6.4 kilowatts. The targets were
free electrons in the ionosphere. “Good” radar returns
Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction
565
are those showing evidence of some type of structure
in the ionosphere. “Bad” returns are those that do not
and their signals pass through the ionosphere. It in-
cludes 351 examples in total which are divided into
225 “Good” (positive) examples and 126 “Bad” (neg-
ative) examples.
In our experiments, we capture 99% of the data
variance while reducing the dimensionality from 34
to 21 with PCA. We use ten-fold cross-validation to
select the best parameters for all involved methods in
the region [2
10
,2
10
] with exponential growth 1 and
get the average classification accuracy rates by run-
ning the algorithms for ve times. We use 300 exam-
ples for training and the others for testing. We set the
output number of MPGR as 10%, 20%, 30%, 40%,
100% of the 300 examples. Linear kernel is chosen
for the dataset. LSTSVM with random sampling is
used for comparison. From the experimental results
in Table 2, we can find that our method SLSTSVM
performs better than LSTSVM. When the percentage
is 10%, the performance of SLSTSVM is already as
same as the one with the percentage 100%. When
the percentage is 30%, the performance is best. We
conclude SLSTSVM can improve its robustness com-
pared with LSTSVM.
Table 2: Classification accuracies and standard deviations
(%) on Ionosphere.
Method
Per
LSTSVM SLSTSVM
10 76.08(10.78) 82.35(5.72)
20 80.78(5.61) 83.53(4.72)
30 80.39(6.04) 85.49(4.51)
40 78.43(10.28) 81.57(5.65)
100 82.35(7.96) 82.35(7.96)
Table 3: Classification accuracies and standard deviations
(%) on Handwritten digits.
Digit pair
Method
LSTSVM SLSTSVM
(0,8) 95.20(3.09) 96.90(1.29)
(3,9) 97.90(1.47) 98.10(1.08)
(3,5) 96.30(1.92) 96.90(1.39)
(2,8) 96.60(1.47) 96.70(1.82)
4.2 Handwritten Digits
This dataset contains features of handwritten digits
(0 9) extracted from a collection of Dutch utility
maps. It contains 2000 examples (200 examples per
class) with ve views. We use the view 64 Karhunen-
Love coefficients of each example image. Because
TSVMs are designed for binary classification while
handwritten digits dataset contains 10 classes. We
choose four pairs (3,5), (2,8), (0,8) and (3,9) to eval-
uate all involved methods for the experiment. Linear
kernel is chosen for the dataset. We use 200 exam-
ples for training, and 200 examples for testing. We
use ten-fold cross-validation to select the best param-
eters for all involved methods in the region [2
10
,2
10
]
with exponential growth 1. We set the input number
of MPGR as 10%, 90%, 100% of the 200 examples.
From the experimental results in Table 5, we can con-
clude that the performance of SLSTSVM is superior
to the one of LSTSVM.
4.3 Pima and Sonar
Pima is dataset that can predict diabetes of Pima In-
dians according to the incidence of medical records
over 5 years . It consists of 768 examples and 8 at-
tributes. Sonar is a dataset that can predict whether
the object is a rock or a mine according to the strength
of a given sonar from different angles. It contains 208
examples and 60 attributes. From the experimental
results, we can conclude that SLSTSVM are superior
to LSTSVM. When the percentage is 10%, the per-
formance of SLSTSVM outperforms the one with the
percentage 100%. We conclude SLSTSVM can im-
prove its robustness.
Table 4: Classification accuracies and standard deviations
(%) on Pima.
Per
Method
LSTSVM SLSTSVM
10 54.55(5.89) 59.85(7.18)
90 55.73(5.27) 57.31(6.86)
100 55.67(5.22) 55.67(5.22)
Table 5: Classification accuracies and standard deviations
(%) on Sonar.
Per
Method
LSTSVM SLSTSVM
20 57.78(4.75) 60.56(10.26)
90 62.96(6.14) 63.89(4.72)
100 62.22(3.90) 62.22(3.90)
5 CONCLUSION AND FUTURE
WORK
In this paper, we have proposed a novel sparse least
squares support vector machines based on manifold-
preserving graph reduction. Experimental results on
multiple real-world datasets indicate that SLSTSVM
are superior to LSTSVM using random sampling. It
would be interesting for future work to exploit the
ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods
566
way which selects the informative and representa-
tive examples from unlabeled examples to multi-view
semi-supervised learning.
ACKNOWLEDGEMENTS
This work is supported by Ningbo University talent
project 421703670 as well as programs sponsored by
K.C. Wong Magna Fund in Ningbo University. It is
also supported by the Zhejiang Provincial Department
of Education under Projects 801700472.
REFERENCES
Chen, W., Shao, Y., and Deng, N. (2014). Laplacian
least squares twin support vector machine for semi-
supervised classification. Neurocomputing, 145:465–
476.
Chen, W., Shao, Y., Li, C., and Deng, N. (2016). MLTSVM:
A novel twin support vector machine to multi-label
learning. Pattern Recognition, 52:61–74.
Christianini, N. and Shawe-Taylor, J. (2002). An introduc-
tion to support vector machines. Cambridge Univer-
sity Press, Cambridge.
Jayadeva, Khemchandani, S., and Chandra (2007). Twin
support vector machines for pattern classification.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 74:905–910.
Kumar, M. and Gopal, M. (2009). Least squares twin sup-
port vector machines for pattern classification. Expert
Systems with Applications, 36:7535–7543.
Kumar, M., Khemchandani, R., and Gopal, M. (2010).
Knowledge based least squares twin support vector
machines. Information Sciences, 180:4606–4618.
Mu, X., Li, J., and Chen, L. (2014). Classification with
noise via weighted least squares twin support vector
machine. Computer Simulation, 31:288–292.
Peng, X. (2010). Tsvr: An efficient twin support vector ma-
chine for regression. Neural Networks, 23:365–372.
Qi, Z., Tian, Y., and Shi, Y. (2012). Laplacian twin sup-
port vector machine for semi-supervised classifica-
tion. Neural Networks, 35:46–53.
Ripley, B. (2002). Pattern recognition and neural networks.
Cambridge University Press, Cambridge.
Shao, Y., Zhang, C., Wang, X., and Deng, N. (2011). Im-
provements on twin support vector machines. IEEE
Transactions on Neural Networks and learning sys-
tems, 1:962–968.
Shawe-Taylor, J. and Sun, S. (2011). A review of optimiza-
tion methodologies in support vector machines. Neu-
rocomputing.
Sun, S., Hussain, Z., and Shawe-Taylor, J. (2014).
Manifold-preserving graph reduction for sparse semi-
supervised learning. Neurocomputing, 124:13–21.
Vapnik, V. (1995). The nature of statistical learning theory.
Springer-Verlag, New York.
Xie, X. and Sun, S. (2014). Multi-view laplacian twin sup-
port vector machines. Applied Intelligence, 41:1059–
1068.
Xie, X. and Sun, S. (2015a). Multi-view twin support vector
machines. Intelligent Data Analysis, 19:701–712.
Xie, X. and Sun, S. (2015b). Multitask centroid twin sup-
port vector machines. Neurocomputing, 149:1085–
1091.
Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction
567