Margin-based Reﬁnement for Support-Vector-Machine Classiﬁcation

Helene D

orksen and Volker Lohweg

inIT – Institute Industrial IT, Ostwestfalen-Lippe University of Applied Sciences, Lemgo, Germany

{helene.doerksen, volker.lohweg}@hs-owl.de

Keywords:

Reﬁnement of Classiﬁcation, Robust Classiﬁcation, Classiﬁcation within Small/Incomplete Samples.

Abstract:

In real-world scenarios it is not always possible to generate an appropriate number of measured objects for

machine learning tasks. At the learning stage, for small/incomplete datasets it is nonetheless often possible to

get high accuracies for several arbitrarily chosen classiﬁers. The fact is that many classiﬁers might perform

accurately, but decision boundaries might be inadequate. In this situation, the decision supported by margin-

like characteristics for the discrimination of classes might be taken into account. Accuracy as an exclusive

measure is often not sufﬁcient. To contribute to the solution of this problem, we present a margin-based

approach originated from an existing reﬁnement procedure. In our method, margin value is considered as

optimisation criterion for the reﬁnement of SVM models. The performance of the approach is evaluated on

a real-world application dataset for Motor Drive Diagnosis coming from the ﬁeld of intelligent autonomous

systems in the context of Industry 4.0 paradigm as well as on several UCI Repository samples with different

numbers of features and objects.

1 INTRODUCTION

Machine Learning (ML) in the context of robust clas-

siﬁcation becomes increasingly important for data

and signal processing in modern complex applica-

tion environments. As example, such environments

might be networked Cyber-Physical-Systems (CPS)

and Multi-sensor or Information Fusion Systems in-

stalled for machine analysis and diagnosis for Indus-

try 4.0 (Niggemann and Lohweg, 2015), cognitive ra-

dio (Ahmad et al., 2010), mobile health (Yi et al.,

2014), etc. In addition to the robust classiﬁcation

problem, in many real-world scenarios the following

problem occurs: a sufﬁcient amount of training data

– depending on an adequate model – is not available,

that is, classiﬁcation tasks have to be operated even

in the situation where it is not possible to run enough

measurements to generate objects for an appropriate

classiﬁer training.

Under these constraints, established signal pro-

cessing algorithms and systems are at their frontiers,

if not even inapplicable in, e.g., scenarios with re-

source limitations, like embedded systems. The ﬁnd-

ings presented here utilize a recently published robust

classiﬁer optimisation methodology (D

orksen and

Lohweg, 2014; D

orksen et al., 2014). In (D

orksen

et al., 2014) an approach is presented where a clas-

siﬁer has to deliver trustful results under the opposite

constraint, i.e. large amount of objects in the samples.

The topics regarding small/incomplete samples as

well as optimisation in ML are of the certain interest

and not new in the academic area (cf. (Chapelle et al.,

2002a) and (Sra et al., 2012)). In the context of the

contribution to the topics, we rely on SVM-based clas-

siﬁer concepts (Boser et al., 1992; Cortes and Vap-

nik, 1995; Vapnik, 1995) and present a new margin-

based approach for classiﬁcation. In our method,

margin value is considered as optimisation criterion

for the reﬁnement of SVM models. To increase the

margin of the separation, we employ a methodology

called ComRef (D

orksen and Lohweg, 2014). Com-

Ref is a combinatorial reﬁnement method (optimi-

sation) for classiﬁcation tasks. From the time com-

plexity point of view, in this frame proposed margin-

based method requires two SVM computations and

additional calculation related to the explained below

min/max rule, which depends linearly on the number

of features. Furthermore, in the experimental part of

our paper, we show that for many samples margin-

based reﬁnement possesses higher generalisation abil-

ity as the initial SVM discrimination. Moreover, the

results demonstrate, that the technique is especially

suitable for small/incomplete data sets, i.e., such sets

with a scarce number of objects for the description of

classes.

As supplement to the discussed topic above, in

the case a dataset is small/incomplete it is nonethe-

less often possible to get a high accuracy in the terms

Dörksen, H. and Lohweg, V.

Margin-based Reﬁnement for Support-Vector-Machine Classiﬁcation.

DOI: 10.5220/0006115502930300

In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 293-300

ISBN: 978-989-758-222-6

293

of classiﬁcation rates not only for one single classi-

ﬁer. For small samples many classiﬁers might per-

form accurately in terms of trained and even test data,

however, the decision boundary might be inadequate.

Especially, if real-world data processing is required

and unknown data appear, within accuracy tests cho-

sen classiﬁer might fail. It is clear that in this case the

discrimination supported by margin-optimised char-

acteristics is more convenient than the decision accu-

racy alone.

We show our approach in experimental results for

the scenario of small datasets. They are modelled

for the dataset from the real-world industrial applica-

tion Motor Drive Diagnosis (Bayer et al., 2013) and

for UCI samples. For modelling of different grades

of incompleteness in the data, several K-fold cross-

validation tests (Alpaydin, 2010) are performed.

The paper is organised as follows. In Sec. 2 we

present related work covering multiple classiﬁcation,

margin initiated classiﬁcation and dimensionality re-

duction. The theoretical aspects of the approach will

be described in Sec. 3. The experimental validation

will be given in Sec. 4. Finally, the conclusions and

future work will be discussed in Sec. 5.

2 RELATED WORK

As previously mentioned in the Introduction, our ap-

proach originates from (D

orksen and Lohweg, 2014).

The shortcoming of original is its combinatorial struc-

ture, i.e. it is hardly applicable to the samples with

large number of features. We overcome this problem

by deﬁning rules for reﬁnement, which will be pre-

sented in Sec. 3.

Our method belongs to the class of multiple clas-

siﬁers, such as, e.g. boosting methods (Freund and

Schapire, 1996) or neural networks (Hagan et al.,

1996) with more than one layer. The relation is

the following: by the initial feature combination and

down-streamed classiﬁcation in the low-dimensional

space, we are able to combine two or more classi-

ﬁers. However, in general, the principle idea of our

approach differs from the such of multiple learners.

Furthermore, state of art is composing multiple

learners (Bag-of-classiﬁers, Bag-of-Feature concepts,

Ensemble Learners) that complement each other to

obtain higher classiﬁcation accuracy. A survey on

combining classiﬁers is given in (Kuncheva, 2004);

some latest developments are found in (Zhou et al.,

2013).

Due to the fact that SVM is a margin-based clas-

siﬁer, the main scope of the scientiﬁc developments

originated in (Boser et al., 1992; Cortes and Vapnik,

1995; Vapnik, 1995) and derived from SVM founda-

tions might be considered as related to our work. In

this sense, some recent investigations which focus on

advantages and applications of SVM, are presented,

e.g., in the book (Ma and Guo, 2014).

State of art developments of margin-based tech-

niques are such originated by Multiple Kernel Learn-

ing (Chapelle et al., 2002b) or, e.g., radius-margin-

based methods presented in (Do et al., 2009). Further,

there is a number of publications describing Large-

Margin-Nearest-Neighbour Classiﬁcation (cf. (Wein-

berger and Saul, 2009)).

Dimensionality reduction (DR) methods are re-

lated to our work. In general, ComRef method which

we extend to the margin-based reﬁnement, originates

from DR. From the ML point of view, DR is often

motivated by the increasing of the generalisation abil-

ity (Guyon et al., 2006). DR is based on the problem

of selecting subsets of most useful, i.e., relevant and

informative features, and ignoring the rest. Other mo-

tivations of the DR are, e.g., data reduction and data

understanding. A profound survey regarding DR do-

main can be found in (Benner et al., 2005); nonlinear

dimensionality reduction methods (Lee and Verley-

sen, 2007); some recently published studies are pre-

sented in (Pei et al., 2013).

In the variety of DR methods, the most closely re-

lated ones w.r.t. our approach are the feature weight-

ing methods. A survey on the weighting methods is

found in (Blum, A. L. and Langley, P., 1997).

3 APPROACH

In the framework of this contribution, we restrict

our considerations to classiﬁers whose decision rules

can be represented as weighted feature combina-

tions. Due to the margin-based nature of the proposed

method, we concentrate on the SVM model. However,

in general, we expect our approach to be applicable to

other classiﬁcation models, e.g., LDA or PCA with a

suitable deﬁnition of a margin.

The basic idea of the SVM model is to maximize

the margin, which is the distance from the hyper-

plane to the objects closest to it on either side (Al-

paydin, 2010; Cortes and Vapnik, 1995). Thus, the

margin plays a crucial role in the design of support-

vectors-based (SV) learning algorithms (Sch

olkopf

and Smola, 2002).

The idea of our proposal is to increase the mar-

gin without increasing the complexity of the model

(VCdim (Vapnik, 1995)). In that context, the com-

plexity of the model is determined by the proﬁle of

the classiﬁcation boundary with respect to the number

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

294

of features or type of the kernel, i.e., higher number of

features implies higher complexity, or e.g., quadratic

kernel classiﬁer possesses higher complexity as lin-

ear kernel classiﬁer. In many cases, lower complex-

ity implies higher robustness of classiﬁcation, that can

lead to better generalisation ability. From this point of

view, our proposal improves classiﬁcation rates stay-

ing robust.

A Classiﬁcation problem of two classes T

and

−

with corresponding objects x

∈T

and x

−

∈T

−

is considered. Objects are vectors x ∈ R

with x =

,··· ,x

) in the d-dimensional feature space X ⊆

. We assume that a linear combination h of features

is given:

h(x) = ha,xi =

∑

i=1

. (1)

With some scalar c ∈R, the rule for the linear classi-

ﬁcation, w.r.t. Eq. (1) is the following:

x ∈ T

if h(x) ≥ c and x ∈ T

−

if h(x) < c. (2)

Within our approach we do not distinguish be-

tween linear separable and non-separable cases. We

concentrate only on the parameters provided by

weighting vector a. Let h(x) be an SVM classiﬁer.

Thus, for the separable case in canonical form the

so-named margin ρ := 2/kak is maximized. Within

labels y = 1 for all x

and y = −1 for all x

−

, it is

equivalent to the solution of the problem:

min

kak

subject to y

(ha,x

i−c) ≥ 1

with x

∈ {x

−

}, ∀j.

For the non-separable case slack variables ξ

≥

0, ∀j are deﬁned. Slack variables store the deviation

from the margin ρ in order to relax the constraints. A

soft margin classiﬁer for a non-separable case is the

solution of the problem:

min

kak

∑

subject to y

(ha,x

i−c) ≥ 1 −ξ

with x

∈ {x

−

}, ξ

≥ 0, ∀j,

where the constant C > 0 determines the trade-off be-

tween margin maximization and training error min-

imization. Similar to separable case, for simplicity,

the margin here is deﬁned as ρ := 2/kak.

In the sense of classical SVM fundamentals, the

functional h(x) represents a hyperplane having the

largest separation, or margin ρ, between two classes.

The hyperplane has the property that the distance is

maximized from the nearest data point on each side.

Our reﬁnement approach is based on the weakening

of this property: the distance from the hyperplane

to some appropriate data is maximized. This idea

originates on the method from Combinatorial Reﬁne-

ment, called ComRef, approach (D

orksen and Lo-

hweg, 2014), where it was shown that for many sam-

ples the reﬁnement of the hyperplane is able to in-

crease the generalisation ability of the classiﬁcation.

We build up our method on the well-known fact, that

increasing of the generalisation ability might rely on

the regularisation property of SVM hyperplane, i.e. on

the property to ﬁnd largest separation ρ.

Let us consider SVM classiﬁer h(x) = ha,xi with

the initial margin ρ. In the frame of this paper, without

loss of generality, we consider l

-norm, i.e. :

ρ =

kak

+ ···+ a

. (3)

Knowing h(x), we are interested in the computation

of a new one hyperplane g(u) = hb,ui, which will be

called reﬁnement of h. The margin ρ

re f

of g has to be

larger as ρ, i.e. ρ

re f

> ρ. It is clear that, since x and u

belong to dissimilar feature spaces, margins provided

by h and g have to be comparable. We will discuss

this topic later.

Within all assumptions and descriptions above, fu-

sion of summands of h for some indices I ⊆{1,··· , d}

is deﬁned as:

∑

i∈I

For I = {1, ··· ,d}, the fusion of summands is h itself.

Otherwise, h can be represented by I together with the

fusion of summands for the indices

I = {1,··· ,d}\I,

which are complementary to I. More generally, h can

be represented by fusion of summands (for ease of use

read: u

= u

,k ∈N):

h(x) =

∑

i∈I

+ ···+

∑

i∈I

= u

+ ···+ u

, (4)

where for k = 1,··· , j holds I

⊂ {1,··· ,d} and all I

are non-empty disjointed subsets of indices with the

property that:

[

k=1

= {1,··· ,d}.

For some parameters b

,··· ,b

∈R, reﬁnement of

a linear classiﬁer resp. feature weighting of Eq. (1)

and fusions of summands in Eq. (4) is deﬁned as:

g(u) = hb,ui =

∑

i=1

. (5)

From the deﬁnition above, it is clear that the reﬁne-

ment is performed in low-dimensional space U ⊆ R

Margin-based Reﬁnement for Support-Vector-Machine Classiﬁcation

295

with j < d. With some scalar ˜c ∈ R, the rule for the

linear classiﬁcation resp. reﬁnement Eq. (5) is now:

u ∈ T

if g(u) ≥ ˜c and u ∈T

−

if g(u) < ˜c. (6)

In the framework of original ComRef, SVM (linear

and quadratic) and AdaBoost (Freund and Schapire,

1996) classiﬁers for initial feature combination w.r.t.

Eq. (1) were analysed. For many datasets it was

shown, that fusion of summands, where the classiﬁca-

tion accuracy with respect to Eq. (6) is higher than in

Eq. (2), might lead to the higher generalisation ability

of Eq. (6) than of Eq. (2).

The main disadvantage of ComRef is its combi-

natorial nature in the term of selection of fusions of

summands, i.e., for samples with large number of fea-

tures the algorithm is hardly applicable. A technique

for fast searching for a suitable fusion of summands

is required. In the frame of the work here, we solve

this problem by applying margin value as optimisa-

tion criterion. Thus, fusions of summands leading to

larger margin in the reﬁnement Eq. (5) will be consid-

ered.

Let g(u) in Eq. (5) be an SVM classiﬁer and

be a reﬁnement of h. Hence, the margin of g in

the low-dimensional space is equal to 2/kbk. It is

clear that for the construction of a margin-based op-

timisation criterion we are not able to compare both

margins directly, since they are located in dissimilar

spaces. To illustrate this effect, consider the hyper-

plane hb,ui− ˜c = 0 in the space R

. It corresponds to

the representation in the initial space R

∑

i∈I

+ ···+ b

∑

i∈I

− ˜c = 0. (7)

Thus, under the assumption that b

= 1,...,b

= 1

and c = ˜c, the hyperplanes in R

and R

are equiva-

lent. However, the margin

ρ = 2/

√

j is, in general,

not equal to ρ = 2/kak. Thus, based on the represen-

tation of Eq. (7), we deﬁne ρ

re f

to be compared with

initial margin ρ = 2/kakfor the optimisation criterion

as follows:

re f

∑

i∈I

+ ···+ b

∑

i∈I

. (8)

Proposition. Let h(x) in Eq. (1) be an SVM classi-

ﬁer with margin ρ. Assume, fusions of summands are

given w.r.t Eq. (4). Further, let g(u) in Eq. (5) be a

reﬁnement of h and be an SVM classiﬁer with margin

re f

If in Eq. (8) for each k = 1,··· , j holds b

≤1 then

re f

≥ ρ.

Proof. It can be easily seen, that for above assump-

tions the following is true:

+ ···+ a

≥

∑

i∈I

+ ···+ b

∑

i∈I

Thus, ρ

re f

≥ ρ.

Several techniques can be considered for the con-

struction of fusion of summands, e.g. initiated by fea-

ture extraction or weighting methods. In the frame

of our work, we discuss the simplest situation: one

single fusion of two summands. It leads to the de-

ﬁned below min/max rule and, e.g. can be extended

or integrated into reﬁnement process iteratively. By

one single fusion of two summands, the reﬁnement is

based on the representation of initial SVM:

h(x) = a

+ ···+

∑

i∈I

+ ···+ a

, (9)

where I

is a set of two indices from {1,··· ,d}.

It is clear, that the reﬁnement occurs in (d − 1)-

dimensional space. Our explanation for paying at-

tention to one single fusion of two summands is fol-

lowing: By the selection

∑

i∈I

with only two sum-

mands we might expect that the reﬁnement for a

’s,

i 6∈ I

, is marginal, i.e. b

≈ 1, for j 6= k. In that

case the margin value ρ

re f

depends mainly on

∑

i∈I

and b

. It holds that, within optimisation of

∑

i∈I

and b

, the margin ρ

re f

represented in Eq. (8) is more

larger more smaller are

∑

i∈I

and b

. We deduce

that |a

|’s with small values might stronger enlarge the

margin as such |a

|’s with large values. On the other

hand, since we are interested in margin optimisation,

we can try to degrade the contribution of |a

|’s with

large values and in such way increase the margin.

Due to above discussions, the min/max rules (resp.

for MIN or MAX later in the tables from Section 4) of

one single fusion of two summands for margin-based

reﬁnement are following:

Min/Max Rules for Margin-based Reﬁnement

i. Compute initial SVM hyperplane h(x) as

in Eq. (1).

ii. Find set I

of two indices from {1, ··· , d} such

that

∑

i∈I

is minimal/maximal, i.e. choose from

|,··· ,|a

| two with mininimal/maximal values.

iii. Recalculate SVM reﬁnement g(u) for u =



,··· ,

∑

i∈I

,··· ,a



ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

296

Table 1: Results of K-fold cv paired t test for SVM and proposed margin-based reﬁnement approach (indicated as Ref MIN or

Ref MAX w.r.t. min/max rules) for datasets M-I and M-II are listed. In rows, overall accuracies (in %), t

K−1

-statistics as well

as margin ρ

re f

for benchmarking are given. Here, for original margin is valid ρ = 2 for all samples.

K-fold SVM Ref MIN t

K−1

MIN ρ

re f

MIN Ref MAX t

K−1

MAX ρ

re f

MAX

M-I (K = 30) 92.26 93.88 7.69 2.80 93.76 6.93 2.68

M-I (K = 100) 88.77 90.37 8.85 2.67 90.04 6.95 2.54

M-II (K = 30) 96.00 97.08 15.25 2.84 97.04 15.00 2.76

M-II (K = 100) 94.39 95.45 10.24 2.83 95.37 9.51 2.70

Table 2: Results of K-fold cv paired t test (K = 10) for SVM and proposed margin-based reﬁnement approach (here, only

Ref MIN w.r.t.min rule) are listed for UCI datasets. The remaining table caption corresponds to such of Table 1. Within

∗

marked samples are considered for further iterations of reﬁnement presented in Table 3 below.

dataset # features # objects SVM Ref MIN t

K−1

MIN ρ

re f

MIN

CNAE (classes 6 vs. 7)

∗

299 240 79.35 86.52 2.89 1.89

CNAE (classes 7 vs. 9) 333 240 86.75 92.96 2.92 1.70

Ecoli

∗

6 220 95.22 96.47 2.52 2.17

Heart 13 270 74.89 77.44 3.91 2.49

Promoters 57 212 74.64 78.32 5.45 2.30

Seeds 7 140 91.66 93.49 5.81 2.54

Splice (classes E vs. I)

∗

60 1535 84.66 87.13 7.04 2.16

Splice (classes I vs. N)

∗

60 2423 81.72 84.54 6.25 2.09

4 EXPERIMENTAL RESULTS

4.1 Motor Drive Diagnosis

We show the results for a real-world industrial appli-

cation Motor Drive Diagnosis (Bayer et al., 2013).

The application relates to an intelligent, autonomic

synchronous motor drive which is used in several ap-

plications in transport systems at airports, conveyor

belts, etc.

We consider two classes—intact and anomaly—

for the motor condition which is a conclusive result,

taking into account the use of adequate features (Ba-

tor et al., 2012). The deﬁned range of typical defects

in drive train applications might be, e.g., ball bear-

ings, axle displacement or inclination of gear-wheels.

For our test, datasets called Motor I (M-I) and Mo-

tor II (M-II) with 52 features and total 5,318 objects

in M-I resp. 72 features and 10,638 objects in M-II

are analysed. The number of objects in M-II might

appear large, however, for this type of applications it

is still not complete. In general, it is almost impossi-

ble to collect complete and well-balanced samples for

motor diagnosis. Due to the different environmental

conditions (e.g. stable/unstable position of the mo-

tor, air temperature/humidity, running time, etc.), the

real completeness/balance of the sample is assumed

to be unknown. To show that our approach is able to

contribute to the solution of the problem, we perform

K-fold cv paired t test setting K = 30 and K = 100,

i.e., one third (circa 3.3 %) resp. 1% of information

about the sample as available. We present the classi-

ﬁcation rates in terms of accuracies (in %) and t

K−1

statistics (Alpaydin, 2010) for comparing two classiﬁ-

cation algorithms. The larger is the value of the t

K−1

statistic, the more likely one algorithm is performing

better/worse as the other. For t

it holds: If |t

|> 2.26,

then we reject the hypothesis that the algorithms have

the same error rate with 97.5% conﬁdence. For |t

| it

is 2.05, hence, for |t

| it is lower than 2.05.

Without loss of generality, SVM separating hy-

perplane is computed using Sequential Minimal Opti-

mization (Sch

olkopf and Smola, 2002). Furthermore,

all data sets are standardised to have margin ρ = 2 in

initial SVM; features are standardised to have equal

standard deviations σ

= 1; positions of objects in

each set are randomised before selecting training and

testing subsets. The results, presented in Table 1, il-

lustrate the performance. We remark that the accu-

racies in the learning stage were close to 100% for

both initial SVM and reﬁnement. Differently to accu-

racies, the margin values of reﬁnements were larger as

such of original SVM. Due to increasing of the mar-

gin, the overall accuracy improvement of the reﬁne-

ment is between 1% −2%, that is a good achievement

for industrial environments taking into account huge

networked Cyber-Physical-Systems and the ﬁnancial

proﬁtability from this kind of optimisation.

Margin-based Reﬁnement for Support-Vector-Machine Classiﬁcation

297

Table 3: Results of K-fold cv paired t test (K = 10) for two successive iterations of min rule.

dataset SVM Iter.Ref MIN t

K−1

MIN ρ

re f

MIN

CNAE (classes 6 vs. 7) 79.35 90.04 4.64 1.95

Ecoli 95.22 97.23 3.49 2.20

Splice (classes E vs. I) 84.66 88.18 10.47 2.27

Splice (classes I vs. N) 81.72 86.23 6.34 2.14

Table 4: F-measures based on true positives and false negatives of each class for K-fold cv paired t test (K = 10).

dataset # features # objects balance F

SV M

/ T

−

) F

Re f MIN

/ T

−

)

Fertility 9 100 7.3 0.12 / 0.82 0.16 / 0.83

Hepatitis 16 80 5.2 0.32 / 0.84 0.35 / 0.85

Heart 13 150 4.0 0.47 / 0.84 0.51 / 0.84

Mammography 5 628 4.6 0.67 / 0.91 0.69 / 0.92

Pima 8 331 4.3 0.45 / 0.83 0.49 / 0.84

Promoters 57 120 7.6 0.04 / 0.93 0.14 / 0.93

Vertebral 6 244 6.2 0.40 / 0.86 0.44 / 0.86

4.2 UCI Repository Test Samples

In this section we present the experimental re-

sults of our proposed margin-based learning ap-

proach based on several UCI Repository samples

(http://mlr.cs.umass.edu/ml/datasets.html). Follow-

ing data sets are considered: CNAE (class labels 6 vs.

7, and 7 vs. 9), Ecoli, Heart, Promoters, Seeds and

Slice (class labels E vs. I, and I vs. N). To show the

potentiality of the method, we present results for sam-

ples with different numbers of features and objects,

where margin-based reﬁnement increases the gener-

alisation ability of the classiﬁcation. Since often it

is not known if the sample is complete or not, there

might be data where reﬁnement is not working. We

perform K-fold cv paired t test for generalisation (Al-

paydin, 2010) with K = 10, i.e., in each fold 10% of

the objects are considered for the training and the rest

for the validation. We demonstrate only results for

min rules, for many samples max rule is leading to

improvement, however, as our tests show for consid-

ered samples, min rule is slightly stronger as max rule.

We remark that also here for many considered sam-

ples the accuracies in the learning stage were close to

100% for both initial SVM and reﬁnement. Table 2

represents the results. Results show that reﬁnement

rules lead often to the increasing of the margin. Due

to increasing of the margin, the accuracy is increasing

in the reﬁnement as well.

In the sample CNAE, margin slightly decreases,

however, reﬁnement leads to higher accuracy.

In addition, we show some results for two succes-

sive iterations of min rule in Table 3.

Finally, we test our approach on several samples

which are not well-balanced, meaning for our exam-

ples that numbers of objects in the classes are ex-

ceeding four times apart. Accuracy as an exclusive

measure is here not sufﬁcient as well. Scores based

on true positives and false negatives of each class is

suitable for such problems. Following UCI samples

are considered: Fertility, Hepatitis, Heart, Mammog-

raphy, Pima, Promoters and Vertebral. The samples

Fertility and Hepatitis are not well-balanced in orig-

inal UCI submission. In the remaining samples, we

scale them artiﬁcially by removing objects from one

of the classes, such that they become non-balanced.

We deﬁne balance of the sample as number of objects

of one class divided by the number of the objects of

the other. F-measures are calculated as scores based

on true positives and false negatives for each class,

where:

F(class) =

2 T P(class)

2 T P(class) + FP(class) + FN(class)

and TP, FP, FN are resp. true positives, false positives

and false negatives. Table 4 represents the results.

5 CONCLUSIONS

In this paper we presented a fast approach for margin-

based reﬁnement of SVM classiﬁer. We deﬁned

min/max rules for the reﬁnement procedure and illus-

trated the performance on several samples with dif-

ferent numbers of features and objects. In our future

work we investigate characteristics of fusions for be-

ing suitable for that kind of reﬁnement. We exam-

ine additional rules for fast reﬁnement. Furthermore,

combinations of rules will be studied. We will per-

form the theoretical and empirical analysis of the re-

ﬁnement methodology in the context of feature selec-

tion and weighting.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

298

In addition, comparison with other existing

weighted SVM models will be evaluated. For efﬁ-

ciency reasons, possibilities for incorporation of re-

ﬁnement recalculations into one single step will be

studied.

ACKNOWLEDGEMENTS

We would like to thank Martyna Bator of Institute

Industrial IT for providing helpful support regarding

Motor Drive Diagnosis dataset. This work was partly

funded by the German Federal Ministry of Educa-

tion and Research (BMBF) within the Leading-Edge

Cluster ”Intelligent Technical Systems Ost-Westfalen

Lippe” (its OWL).

REFERENCES

Ahmad, K., Meier, U., and Kwasnicka, H. (2010). Fuzzy

logic based signal classiﬁcation with cognitive ra-

dios for standard wireless technologies. In Cognitive

Radio Oriented Wireless Networks Communications

(CROWNCOM), 2010 Proceedings of the Fifth Inter-

national Conference on, pages 1–5.

Alpaydin, E. (2010). Introduction to Machine Learning.

The MIT Press, Cambridge, 2 edition.

Bator, M., Dicks, A., M

onks, U., and Lohweg, V. (2012).

Feature Extraction and Reduction Applied to Sen-

sorless Drive Diagnosis. In Hoffmann, F. and

ullermeier, E., editors, 22. Workshop Computational

Intelligence, volume 45 of Schriftenreihe des Instituts

ur Angewandte Informatik - Automatisierungstechnik

am Karlsruher Institut f

ur Technologie, pages 163–

178. KIT Scientiﬁc Publishing.

Bayer, C., Bator, M., M

onks, U., Dicks, A., Enge-

Rosenblatt, O., and Lohweg, V. (2013). Sensorless

Drive Diagnosis Using Automated Feature Extraction,

Signiﬁcance Ranking and Reduction. In Seatzu, C.

and Zurawski, R., editors, ETFA 2013, pages 1–4.

Benner, P., Mehrmann, V., and Sorensen, D. C. D. C., ed-

itors (2005). Dimension reduction of large-scale sys-

tems : proceedings of a workshop held in Oberwol-

fach, Germany, October 19-25, 2003, Lecture notes in

computational science and engineering. Springer.

Blum, A. L. and Langley, P. (1997). Selection of relevant

features and examples in machine learning. Artiﬁcial

Intelligence, 97:245–271.

Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A

training algorithm for optimal margin classiﬁers. In

Proceedings of the Fifth Annual Workshop on Compu-

tational Learning Theory, COLT ’92, pages 144–152,

New York, NY, USA. ACM.

Chapelle, O., Vapnik, V., and Bengio, Y. (2002a). Model se-

lection for small sample regression. Machine Learn-

ing, 48(1-3):9–23.

Chapelle, O., Vapnik, V., Bousquet, O., and Mukherjee, S.

(2002b). Choosing multiple parameters for support

vector machines. Mach. Learn., 46(1-3).

Cortes, C. and Vapnik, V. (1995). Support-vector networks.

Machine Learning, 20(3):273–297.

Do, H., Kalousis, A., Woznica, A., and Hilario, M. (2009).

Margin and radius based multiple kernel learning.

In Buntine, W., Grobelnik, M., Mladeni

c, D., and

Shawe-Taylor, J., editors, Machine Learning and

Knowledge Discovery in Databases: European Con-

ference, ECML PKDD 2009, Bled, Slovenia, Septem-

ber 7-11, Proceedings, Part I. Springer, Berlin, Hei-

delberg.

orksen, H. and Lohweg, V. (2014). Combinatorial re-

ﬁnement of feature weighting for linear classiﬁca-

tion. In Emerging Technology and Factory Automa-

tion (ETFA), 2014 IEEE, pages 1–7.

orksen, H., M

onks, U., and Lohweg, V. (2014). Fast

classiﬁcation in industrial Big Data environments.

In Emerging Technology and Factory Automation

(ETFA), 2014 IEEE, pages 1–7.

Freund, Y. and Schapire, R. E. (1996). Experiments with

a New Boosting Algorithm. Thirteenth International

Conference on Machine Learning.

Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. A.

(2006). Feature Extraction: Foundations and Appli-

cations (Studies in Fuzziness and Soft Computing).

Springer-Verlag New York, Inc, Secaucus and NJ and

USA.

Hagan, M. T., Demuth, H. B., and Beale, M. (1996). Neural

Network Design. PWS Publishing Co., Boston, MA,

USA.

Kuncheva, L. I. (2004). Combining Pattern Classiﬁers:

Methods and Algorithms. Wiley-Interscience.

Lee, J. A. and Verleysen, M. (2007). Nonlinear Dimen-

sionality Reduction. Springer Publishing Company,

Incorporated, 1st edition.

Ma, Y. and Guo, G. (2014). Support vector machines appli-

cations. Springer.

Niggemann, O. and Lohweg, V. (2015). On the diagnosis

of cyber-physical production systems. In Proceedings

of the Twenty-Ninth AAAI Conference on Artiﬁcial In-

telligence, January 25-30, 2015, Austin, Texas, USA.,

pages 4119–4126.

Pei, J., Tseng, V. S., Cao, L., Motoda, H., and Xu, G., ed-

itors (2013). Advances in Knowledge Discovery and

Data Mining, 17th Paciﬁc-Asia Conference, PAKDD

2013, Gold Coast, Australia, April 14-17, 2013, Pro-

ceedings, Part I-III, Lecture Notes in Computer Sci-

ence. Springer.

Sch

olkopf, B. and Smola, A. J. (2002). Learning with ker-

nels: Support vector machines, regularization, opti-

mization, and beyond. Adaptive computation and ma-

chine learning. MIT Press, Cambridge and Mass.

Sra, S., Nowozin, S., and Wright, S. J. (2012). Optimization

for Machine Learning. Neural information processing

series. MIT Press.

Vapnik, V. N. (1995). Statistical Learning Theory. Wiley,

London, 1 edition.

Margin-based Reﬁnement for Support-Vector-Machine Classiﬁcation

299

Weinberger, K. Q. and Saul, L. K. (2009). Distance metric

learning for large margin nearest neighbor classiﬁca-

tion. J. Mach. Learn. Res., 10:207–244.

Yi, W.-J., Sarkar, O., Mathavan, S., and Saniie, J. (2014).

Wearable sensor data fusion for remote health assess-

ment and fall detection. In Electro/Information Tech-

nology (EIT), 2014 IEEE International Conference

on, pages 303–307.

Zhou, Z. H., Roli, F., and Kittler, J. (2013). Multiple Clas-

siﬁer Systems: 11th International Workshop, MCS

2013, Nanjing, China, May 15-17, 2013. Proceed-

ings. Lecture Notes in Computer Science, vol. 7872.

Springer London, Limited.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

300