Kernel Generations for a Diagnosis Model with GP

Jongseong Kim and Hoo-Gon Choi

Department of Systems Management Engineering, Sungkyunkwan University, Suwon, Republic of Korea

Keywords: Diagnosis Model, Genetic Programming, Absorbing Evolution, Accuracy and Recall Rate, Computing

Time.

Abstract: An accurate diagnosis model is required to diagnose the medical subjects. The subjects should be diagnosed

with high accuracy and recall rate by the model. The laboratory test data are collected from 953 latent

subjects having type 2 diabetes mellitus. The results are classified into patient group and normal group by

using support vector machine kernels optimized through genetic programming. Genetic programming is

applied for the input data twice with absorbing evolution, which is a new approach. The result shows that

new approach creates a kernel with 80% accuracy, 0.794 recall rate and 28% reduction of computing time

comparing to other typical methods. Also, the suggested kernel can be easily utilized by users having no and

little experience on large data.

1 INTRODUCTION

The number of latent subjects with type 2 diabetes

mellitus in Korea has rapidly increased over the past

three decades. In general, a laboratory test is taken

for the latent subject to figure out seriousness of the

disease. Many specialized diabetes clinics in Korea

utilize fasting glucose level as the main parameter to

diagnose type 2 diabetes mellitus although it is

changed on daily basis and correlated with other

testing parameters. If the fasting glucose level is >

120 mg/dL, type 2 diabetes mellitus is diagnosed for

latent subjects. However, a level of fasting glucose

is correlated with other laboratory test parameters.

Therefore, it is necessary for accurate diagnosis by

figuring out the relationship of between fasting

glucose level and various test parameters. If the

accurate diagnosis model is developed for type 2

diabetes mellitus by using good kernel functions, it

is helpful for the clinics to diagnose the disease with

low misdiagnosing errors and high recall rate. The

diagnosis model classifies the subjects into either

patient group or normal group, and the relationship

between the principal parameters among testing

parameters and subject groups can be specified. The

purpose of this study is to generate an excellent

kernel for an accurate diagnosis model with high

recall rate of type 2 diabetes mellitus by which the

test results are accurately grouped or classified. For

*Correspondig author

this purpose, a new method “absorbing evolution

(AE)” is developed by optimizing support vector

machine (SVM) kernel through genetic

programming (GP) in this study.

When SVM is used to classify the input data into

two classes with a linear function in a hyperplane,

two conditions should be established: one is setting

for the amount of classifying errors using cost

parameter and the other is defining kernel functions.

The cost parameter is required for determining

classification accuracy and fitness, and kernel

function decides whether the test data are linearly

separable or not. The kernel function has its own

cost parameter. In this study, the accuracy of kernel

function is the main concern along with fixed cost

parameter.

GP has a complex tree structure consisting of

both function node and terminal node in each

program as an object. The tree structure is presented

by S-Expression formats of Lisp language.

Comparing with GA, GP has different representation

scheme for genes. GP operations including crossover,

mutation, inversion or permutation, edit,

capsulization, and elimination perform on the basis

of each tree or program. To obtain the optimal

solution for a given problem, five components

should be determined. They are a set of terminals, a

set of functions, fitness functions, algorithm

parameters, and terminating condition. This study

focuses on a new method for constructing the initial

population and defining a fitness function for

Kim J. and Choi H..

Kernel Generations for a Diagnosis Model with GP.

DOI: 10.5220/0004028500570062

In Proceedings of the International Conference on Data Technologies and Applications (DATA-2012), pages 57-62

ISBN: 978-989-8565-18-1

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

clinical data, and fast GP evolution for generating

SVM kernel.

2 POTENTIAL RESEARCH ON

KERNEL AND PARAMETERS

The well-known evolutionary algorithms (EA) such

as GP, genetic algorithm (GA), particle swarm

optimization (PSO), and evolutionary strategy (ES)

have been combined into SVM to evolve kernel

functions or kernel parameters. Howley and Madden

(2005), and Gagne, et al. (2006) evolved a kernel

function using GP. Their methods produced good

results for practical cases, but there was no

guarantee for the final genetic kernel to be positive-

semidefinite (PSD). Sullivan and Nuke (2007)

proposed combinatory kernel-bounded operations to

develop new complex kernels from basic predefined

kernels. Methasate and Theeramunkong (2007)

proposed a weighted tree in which weight of an edge

becomes a parameter of the children connected to

their parent nodes. Also, the weight is adjusted by

gradient descent using GP. Simian (2008) introduced

a multiple kernel based on simple polynomial

kernels using GP. Friedrichs and Igel (2005)

proposed a covariance matrix adaptation evolutional

strategy (CMAES) to extend the radial basis

function (RBF) kernel with scaling and rotation.

Then, invariance of linear transformation was

realized within the space of SVM parameters. A

mechanism, which is similar with reverse singular

value decomposition (SVD), was used to guarantee

that the final kernel is semi-positive definite.

Phienthrakul and Kilsirikul (2005 and 2008) used ES

for learning the weights in a weighted linear

combination of Gaussian radial basis functions.

Souza et al. (2006) used PSO to obtain optimal

parameters for multi-class classification in a

Gaussian kernel function. In a similar way, Huang

and Wang (2006), and Lessmann et al. (2006) used

GA for classification. Runarsson and Sigurdsson

(2006) used a parallel ES to generate optimal

parameters in a Gaussian kernel. Mierswa (2006)

combined PSO into ES to solve the constrained

optimization problem related to SVM. Keerthi et al.

(2007) investigated a method to tune parameters in

SVM models based on minimizing a smooth

performance validation function. Simaian and

Stoicar (2009) proposed a stationary tree structure

having a combination of known kernels in which

each kernel parameters were encoded in a single

chromosome. These parameters are then optimized

using GA. Kernel functions should be PSD and

typically meet with Mercer conditions (Refer to

Section 4.1).

The previous studies have not considered

computing time for selecting or generating kernels.

However, faster selection and generation of kernels

are necessary for reducing the time since both data

sizes and dimensions have been big and large. This

study proposes AE algorithm to generate kernels

with fast speed in GP. This new approach is applied

for laboratory test data collected from latent subjects

of type 2 diabetes mellitus.

3 TARGET DATA DESCRIPTIONS

In this study, laboratory test results were collected

from a specialized diabetes mellitus clinic in Seoul,

Korea in 2009. The total number of data items was

953.The laboratory test includes 47 different

parameters related to liver function, hematology,

urinalysis, blood sugar, kidney profile, and lipid

profile. Such parameters having either identical

values or many missing values are removed

regardless of gender. A total of 32 parameters

including gender type are selected for analysis. Also,

test values located outside of the upper and lower

limits are removed from each parameter regardless

of subject to reduce any possible measuring errors.

In this study, LIBSVM (Chang and Lin, 2011) is

used for separating the subjects into normal group

(false) and patient group (true) in SVM learning.

4 SVM KERNEL OPTIMIZATION

USING GP

In this study, SVM is used for classification of

laboratory test results collected from 953 latent

subjects. For this purpose, either linear classification

or nonlinear classification is appropriate for such big

data. Maximum margin SVM is available for linear

classification while soft margin SVM is good for it

with penalties given to misclassification. When both

SVMs are still inappropriate, kernel trick is utilized

by which the data can be linearly classified after

mapping the original data into high dimensional

space. Nonlinear classification adapts both

misclassification penalty and kernel trick. Accuracy

of a kernel affects the performance of nonlinear

classification. Therefore, it is important to select

better kernels. There are two ways to select SVM

kernels: one is based on expert’s experience and the

other is using popular methods such as grid search,

DATA2012-InternationalConferenceonDataTechnologiesandApplications

gradient search, GA, and PSO. In this study, GP is

adapted as a new approach. GP needs high

computing time to obtain the kernel and no standard

procedure is available. However, GP does not

require the expert’s experience or knowledge to

generate kernels. This study proposes a new

approach to reduce its computing time in searching

kernels.

4.1 Mercer’s Theorem and Initial

Population

The kernel should be positive-semidefinite (PSD) to

satisfy Mercer’s theorem. If all kernel matrices

generated by kernels are symmetry and have

eigenvalues which are greater than 0, the kernel

become PSD. All evolved programs generated by

GP should meet the theorem to be kernels. However,

only a few programs satisfy the theorem at early

evolution stage, even when evolutionary process for

given data is terminated. A typical way of resolving

this problem, the number of programs and

generations can be increased. However, this

resolution requires higher computing time. To

reduce the time, the initial population is made of

programs through GP instead of using random

population in this study. Then, more programs

satisfying Mercer’s theorem are generated as shown

in Table 1. Since the initial population is obtained, it

proceeds again to generate the optimize SVM kernel

through GP. In other words, highly accurate SVM

kernel to define a diagnosis model is found by using

GP twice.

Table 1: Percent of programs by population types.

Population Type Percent of programs satisfying Mercer’s

Theorem (%)

Random 0.5

Initial Population 5

4.2 Selected Primitives in GP

In a GP structure, program trees need primitives or

operators. Each kernel has scalar outputs from input

vectors and it is used for combining programs in

evolutionary process. The selection of appropriate

primitives is important for GP in terms of speed and

simplification of process. In addition to four basic

operators (+, -, *, /), several operators such as power

and log can make faster evolutionary speed. Other

primitives such as p-norm and mhnorm can make

easier to figure out characteristics of raw data in a

space mapped by kernels. Also, such primitives as

L2 norm and p-norm can be used to determine the

features of a multi-dimension space. Table 2 shows

those primitives with their argument and return types

adapted in this study. The combination between

primitives is done by Automatic Defined Function

(ADF) method which allows strict primitive

operations only proved mathematically.

Table 2: Selected primitives used for GP.

Name Args. And return types Description

ssadd, sssub, ssmul,

ssdiv, sspow

(scalar, scalar) →

(scalar)

arithmetic

operations

vvadd, vvsub, dot

(vector, vector) →

(vector)

vsmul, svmul, vsdiv,

vspow

(vector, scalar) →

(scalar)

ssin,scos,stan

(scalar) → (scalar)

triangular

functions

vsin,vcos,vtan

(vector) → (vector)

sexp, slog, sneg,

sabs, ssqrt

(scalar) → (scalar)

exponential, log,

negative, absolute

vexp, vlog, vneg,

vsbs, vsqrt

(vector) → (vector)

p_norm, norm2

(vector, vector) →

(scalar)

p-norm, L2 norm

p_normdist,norm2di

st, mhdist

(vector,vector) →

(scalar)

p-norm distance,

L2 norm distance,

Mahalanobis

distance

random scalar (scalar)

random vector (vector)

4.3 Fitness Function

Fitness function is used as the major criterion to

select better kernels. An appropriate fitness function

should be established to make ensure whether kernel

functions meet with required conditions or not. In

this study, laboratory test results are analysed to

develop an accurate diagnosis model. The model can

classify the results into patient group and normal

group. The conditions required for kernel are the

overall precision of classification and recall rate. If a

kernel classifies correctly real patients as true

patients and normal subjects as normal, the model

has high recall rate. Large costs would be paid if the

model misclassifies real patients as normal subjects

or vice versa. In addition to precision and recall rate,

the number of support vectors should be low in the

fitness function. The data mapped into kernel

functions would be linearly separable better if the

number of vectors is low. Howley and Madden

(2005) presents that overfitting risks are reduced

with low number of vectors. In this study, the

developed fitness function includes the number of

KernelGenerationsforaDiagnosisModelwithGP

support vectors as shown Eq. 1.

Fitness

=AVG

∗R



AC×



PC×100×α+RC×100×β





(1)

SV is the number of support vectors, R is the radius

of the smallest hypersphere (Cristianini, 2000), AC

is the accuracy, PC is the precision rate, RC is the

recall rate in SVM training/validation, and α and β

are weights given for precision and recall rate. In

this study, α and β are set as 0.2 and 0.8,

respectivelysince the recall rate is regarded as more

important measurement in developing diagnosis

models.

4.4 Absorbing Evolution (AE)

Algorithm

There exist two major problems in using GP to

obtain optimized SVM kernels: one is high

computing time and the other is lack of PSD

programs. The latter is more serious in terms of

evolutionary speed and dropping into optimal

solution locally. Parallel evolutionary algorithm

(PEA) can solve these problems in which the

original data is divided into partial populations, each

population evolves independently and its result is

shared by migrating its objects each other with

specific operators or primitives under given

conditions (Kim, 2011). Yet, the algorithm has the

possibility to generate too small number of programs

during evolution process. In this study, the island

algorithm as a typical PEA is modified and utilized

to minimize this problem. The modified island

algorithm is defined as “absorbing evolution (AE)”

algorithm. The AE algorithm composes a target

population consisting of the desired objects and

defines it as the initial partial population. Then, the

target population absorbs new objects from other

populations.If some objects aremigrated from a

partial population, the same number of new objects

migrates into the partial population in the island

algorithm. Therefore, the population size is always

same. In AE algorithm, however, the size of the

target population is increased through migration of

objects selected from other partial populations.

The procedure of AE algorithm is as follows:

Step 1: Initialization

Compose target population with desired objects and

the rest of objects are grouped into other partial

populations

Step 2: Termination

Establish terminating conditions

Step 3: Migration

Step 3.1: Selection of migrating objects

Select migrating objects based on both migration

rate defined for each partial population and

fitness function values, and copy

Step 3.2: Transferring migrating objects into

neighbouring populations

Step 3.3: Receiving migrated objects

Step 3.4: Exchanging the existing objects in the

current population with new migrating objects to

maintain the same size of population

Step 4: Absorption

Step 4.1: Selection of absorbing objects except

migrating objects

Select absorbing object based on absorbing

conditions

Step 4.2: Target population absorbs selected

objects

Step 5: Deportation and elimination

Target population deports and removes those objects

on the basis of deportation criteria

Step 6: Executing standard GP and back to Step 2.

4.5 Setting Algorithm Parameters in

In this study, the initial population is made of

programs through GP (GP-1) to reduce computing

time. Then, more programs satisfying Mercer’s

theorem are generated. Since the initial population is

obtained, it proceeds again to generate the optimize

SVM kernel through GP (GP-2) with AE. In other

words, highly accurate SVM kernel to define a

diagnosis model is found by using GP twice.

Table 3: Algorithm parameter settings.

GP-1 GP-2

Number of iterations 30 10

Number of

populations

1000

5000

(including from GP-

1’s best results)

Number of

generations

1000 1000

Crossover probability 0.8 0.8

Mutation probability 0.4 0.1

Selection algorithm

Tournament

t= 3

Tournament

t= 3

SVM cross validation 10 fold validation 10 fold validation

Terminate condition None None

Additional condition Mercer’s theorem

Mercer’s theorem or

not matter and

migration rate,

absorption condition

Table 3 shows GP algorithm parameters and

DATA2012-InternationalConferenceonDataTechnologiesandApplications

terminate condition. GP-1 focuses on kernel function

satisfied Mercer’s theorem quickly. So, mutation

probability is set high. GP-2, all parameters are

general except migration rate, absorbing condition

and elimination rate. Migration rate is 0.1.

Absorbing condition is whether a function

satisfiesMercer’s theorem or not. Finally,

elimination rate is 0.05.

5 RESULTS

Table 4: The 13 kernels obtained for laboratory test results.

Kernels

scos(mhdist(vsdiv(VE1, 1.0), VE0))

scos(mhdist(VE1, vneg(svmul(-1.0, VE0))))

scos(mhdist(VE1, vspow(VE0, 1.0)))

scos(mhdist(VE1, vabs(vabs(VE0))))

scos(mhdist(VE1, vsdiv(VE0, 1.0)))

scos(mhdist(vspow(VE1, 1.0), VE0))

scos(mhdist(svmul(1.0, VE1), VE0))

scos(mhdist(vsmul(VE1, 1.0), VE0))

scos(mhdist(VE1, vvsub(vvsub(VE1, VE1), vneg(VE0))))

scos(mhdist(vabs(vspow(VE1, 1.0)), VE0))

scos(ssadd(slog(-1.0), mhdist(VE1, VE0)))

scos(mhdist(vvsub(svmul(1.0, VE1), vvsub(VE1, VE1)),

VE0))

scos(mhdist(VE0, vsmul(VE1, 1.0)))

Fitness

Kernel

SVM

Precision Recall

Making

Learning

Accuracy

Time

0.269

0.878

0.005

79.412

1.000

0.794

0.269

0.854

0.004

79.412

1.000

0.794

0.269

0.902

0.004

79.412

1.000

0.794

0.269

0.840

0.004

79.412

1.000

0.794

0.269

0.964

0.004

79.412

1.000

0.794

0.269

0.901

0.004

79.412

1.000

0.794

0.269

0.895

0.007

79.412

1.000

0.794

0.269

0.897

0.006

79.412

1.000

0.794

0.269

0.984

0.004

79.412

1.000

0.794

0.269

1.002

0.004

79.412

1.000

0.794

0.269

0.998

0.005

79.412

1.000

0.794

0.269

1.227

0.004

79.412

1.000

0.794

0.269

1.146

0.005

79.412

1.000

0.794

*VE0, VE1 are single data vector (like X, Y).

Table 4 presents top 13 kernels which have identical

fitness values obtained through GP using AE

algorithm. These kernels have the identical fitness

value. Among the kernels, the second kernel has the

least computing time and can be selected finally.

When the kernel is applied for the original data, that

is, laboratory test results with 10 fold cross

validation, 80% accuracy of classification is shown

while other classifiers such as RBF kernel and linear

kernel are shown about 81%. Also, the recall rate is

0.794 for GP with AE algorithm, and it is 0.644 for

linear kernel. Even though other kernels present

similar performance in terms of accuracy and recall

rate, the new method suggested in this study shows

the highest recall rate and needs less computing time

compared to standard GP algorithms. The time is

reduced as much as 28%.

The selected kernel has the expression as shown

in Eq. 2.

Kernel=cos



X−







−









(2)

where

∑



is an inverse of the covariance matrix

of the original data, e.g. the laboratory test results.

Both X and Y are single data vectors in the original

space.

6 CONCLUSIONS AND

DISCUSSION

This study focuses on generating an accurate

diagnosis model from laboratory test results

obtained by type 2 diabetes mellitus subjects. The

accurate model should be able to classify the

subjects into patient group and normal group with

high precision. There are 32 test parameters

including fasting glucose level in a laboratory test.

Because these parameters are correlated and have

complex relations, a diagnosis model cannot classify

the subjects clearly. In other words, misclassification

of normal subjects into patient group or vice versa

can be occurred. This study suggests a new

approach to optimize SVM kernels with GP.

Especially, GP is utilized twice along with AE

algorithm. Then, the accuracy of the best kernel is

80% and the recall rate is 0.794. Other typical kernel

like RBF shows similar accuracy. However, the

method suggested by this study shows the highest

recall rate and needs less computing time although it

utilizes GP twice. In addition to this achievement,

other advantage of this study is easiness to generate

a good diagnosis model from the developed kernel

function without expert’s experience on clinical data.

Yet, the time should be minimized by better

approaches. Also, further research should be

followed by investigation of advanced

KernelGenerationsforaDiagnosisModelwithGP

methodologies to generate more PSD programs.

ACKNOWLEDGEMENTS

This research is supported by Samsung Research

Fund (2010-0683-000) of Sungkyunkwan University,

Suwon, Republic of Korea.

REFERENCES

Chang, C. C. and Lin, C. J., 2011. LIBSVM: A library for

support vector machines. ACM Transactions on

Intelligent Systems and Technology, Vol. 2, No. 27, pp.

1-27, Software available at http://www.csie.ntu.edu.

tw/~cjlin/libsvm

Cristianini, N. and Shawe-Taylor, J., 2000.An Introduction

to Support Vector Machines. Cambridge University

Press.

Friedrichs, F. and Igel, C., 2005.Evolutionary tuning of

multiple SVM parameters, Neurocomputing, Vol.64,

pp.107–117.

Gagne, C., Schoenauer, M., Sebag, M. and Tomassini, M.,

2006. Genetic programming for kernel based learning

with co-evolving subsets selection, Proceedings of

Parallel Problem Solving in Nature, LNCS, No. 4193,

pp. 1006–1017.

Howley, T. and Madden, M. G., 2005. The genetic kernel

support vector machine: Description and evaluation,

Artif. Intell. Rev., Vol.24, No.3–4, pp.379–395.

Hsu, C. W., Chang, C. C. and Li, C. J., 2010, A Practical

Guide to Support Vector Classification, National

Taiwan University, Taiwan.

Huang, C. L. and Wang,C. J., 2006. A GA-based feature

selection and parameter optimization for support

vector machines, Expert Systems with Applications,

Vol. 31, No. 2, pp. 231–240.

Huang, C. L., Chen, M. C.and Wang, C. J., 2007. Credit

scoring with a data mining approach based on support

vector machines, Expert Systems with Applications,

Vol. 33, No. 4, pp. 847–856.

Kim, Y. K, 2011. Evolution Algorithms, Chonnam

National University Press, Republic of Korea.

Lessmann, S., Stahlbock, R. and Sven, F., 2006.Genetic

algorithms for support vector machine model selection,

Proceedings of International Joint Conference on

Neural Networks, pp. 3063–3069.

Methasate, I. and Theeramunkong, T., 2007.Kernel Trees

for Support Vector Machines, IEICE Trans. Inf. &

Syst., Vol. E90-D, No 10, pp. 1550–1556.

Mierswa, I., 2006. Evolutionary learning with kernels: A

generic solution for large margin problems,

Proceedings of the Genetic and Evolutionary

Computation Conference, pp. 1553–1560.

Phienthrakul, T. and Kijsirikul, B., 2005.Evolutionary

strategies for multiscale radial basis function kernels

in support vector machines, Proceedings of

Conference on Genetic and Evolutionary Computation,

pp.905–911.

Phienthrakul, T and Kijsirikul, B., 2008. Adaptive

stabilized multi-RBF kernel for support vector

regression, Proceedings of International Joint

Conference on Neural Networks, pp. 3545–3550.

Runarsson, T. P. and Sigurdsson, S., 2004.Asynchronous

parallel evolutionary model selection for support

vector machines,Neural Information Processing –

Letters and Reviews, Vol. 3, No. 3, pp. 59–67.

Keerthi, S., Sindhwani, V. and Chapelle, O., 2007.An

efficient method for gradient-based adaptation of

hyperparameters in SVM models, Advances in Neural

Information Processing Systems 19, MIT Press,

Cambridge, MA, pp. 674–480.

Simian, D., 2008.A model for a complex polynomial SVM

kernel, Proceeding of the 8th WSEAS International

Conference on Simulation, Modelling and

optimization, pp. 164–169

Simian, D. and Stoica, F., 2009.An evolutionary method

for constructing complex SVM kernels, Proceedings

of the 10th WSEAS International Conference on

Mathematics and Computers in Biology and

Chemistry, pp. 172–177.

Souza, B. F., Carvalho, A. C., R. Calvo and Ishii, R. P.,

2006.Multiclass SVM model selection using particle

swarm optimization, Proceedings of the Sixth

International Conference on Hybrid Intelligent

Systems, pp. 31–34.

Sullivan, K. and Luke, S., 2007. Evolving Kernels for

Support Vector Machine Classification, Proceedings

of the 9

annual conference on Genetic and

evolutionary computation, GECCO, pp. 1702–1707.

DATA2012-InternationalConferenceonDataTechnologiesandApplications