Comparison of SVM-based Feature Selection Method for Biological

Omics Dataset

Xiao Gao

Xi’an University of Posts & Telecommunications, Xi 'an, Shaanxi Province, China

Keywords: Cancer Classification, Feature Selection, Support Vector Machines, Recursive Feature Elimination.

Abstract: With the development of omics technology, more and more data will be generated in cancer research.

Machine learning methods have become the main method of analysing these data. Omics data have the

characteristics of the large number of features and small samples, but features are redundant to some extent

for analysis. We can use the feature selection method to remove these redundant features. In this paper, we

compare two SVM-based feature selection methods to complete the task of feature selection. We evaluate the

performance of these two methods on three omics datasets, and the results showed that the SVM-RFE method

performed better than the pure SVM method on these cancer datasets.

1 INTRODUCTION

Genomics and other related omics technologies have

been widely adopted to obtain new insights into the

pathogenesis of cancer patients. Machine learning is

a commonly used method to analyze these data, but

omics datasets have a large amount of repetitive and

strongly correlated feature (Karahalil 2016).

Redundant features affect the efficiency and accuracy

of machine learning models (Bhola and Singh 2018).

Therefore, we need feature selection technology to

process these datasets to improve the processing

efficiency and performance of our machine learning

model (Golub, Slonim, 1999).

Feature selection methods have many categories,

such as penalty-based method, tree-based method and

recursive feature elimination method and so on. The

penalty-based feature selection can automatically set

the small estimation coefficient to zero to reduce the

complexity of the model (Wang. Zhou. Wu. Chen.

Fan 2020). When we use tree-based models for

feature selection, after training any tree model, you

can access the feature importance attribute that ranks

features to complete the feature selection process

(Jotheeswaran, Koteeswaran 2015). Recursive

feature elimination (RFE) method is a very popular

and efficient feature selection method, which is

suitable for prediction models with feature weight as

https://orcid.org/0000-0003-1520-8704

model fitting result. The RFE algorithm obtains the

optimal combination of variables to maximize the

model performance by removing features recursively

(GUYON, WESTON, BARNHILL 2002). The

process of feature selection using recursive feature

elimination is as follows: Firstly, all feature variables

are used to train the model. Secondly, one of the worst

features is removed each time according to the

performance of the feature on the model. Thirdly, the

second step is recursively repeated until the number

of remaining features reaches the required number of

features.

There are many common feature selection

methods that can be combined with RFE

methodology for feature selection, such as support

vector machine (SVM) or random forest (RF). Boser

et al. proposed advanced SVM classification

algorithms in 1992 (Boser 1992, Vapnik 1998).

Moreover, Mukherjee et al. proposed SVM feature

selection method (Weston, Mukherjee, Chapelle,

Pontil, Vapnik 2001). SVM classifies samples by

finding a hyperplane that maximizes the distance

between classes in training data. The method of

feature selection using SVM is ranking the

importance of feature through the coefficients

attribute provided by SVM. When SVM-RFE is used

for feature selection, the features are evaluated

according to the performance of each feature on the

Gao, X.

Comparison of SVM-based Feature Selection Method for Biological Omics Dataset.

DOI: 10.5220/0011247400003443

In Proceedings of the 4th International Conference on Biomedical Engineering and Bioinformatics (ICBEB 2022), pages 595-600

ISBN: 978-989-758-595-1

595

model, and then those less important features are

recursively deleted until the remaining number of

features meets our requirements (Meng, Yang 2008).

We can also improve time efficiency by removing

multiple features at a time, but it may lead to a decline

in model performance (Tang, Zhang, Huang 2007).

Random Forest was formally proposed by Leo

Breiman et al in 2001. The Random Forest feature

selection method (Genuer, Poggi, Tuleau-Malot

2016) is to access the feature importance attribute

after completing the random forest classifier fitting

and rank the features according to the importance.

Similarly, RF-RFE adopts an identical idea for the

procedure of RFE as SVM-RFE.

In this paper, we want to compare the

performance of SVM and SVM-RFE feature

selection methods on the omics dataset. Therefore,

we use these two feature selection methods to select

features on three cancer datasets, and the feature

selection performance is evaluated on Logistic

regression (LR) and random forest (RF) models.

The rest part of this article is as follows: Section

Ⅱ presents the theory of support vector machine and

recursive feature elimination. Section Ⅲ presents the

results on different cancer datasets using two feature

selection methods. In addition, we also studied the

influence of SVM-RFE each iteration to eliminate

different number of features on the model. Section Ⅳ

concludes our work and proposes future directions.

2 METHOD

2.1 Datasets

We used the cancer data set from TCGA database

(TCGA Network 2012). The TCGA database is a

project jointly supervised by the National Cancer

Institute and the National Human Genome Research

Institute a very comprehensive cancer genetic data.

In this paper, we used the miRNA datasets from

three cancer types in TCGA to compare the

performance of SVM and SVM-RFE feature

selection methods, namely thyroid cancer (THCA),

glioma (GBMLGG) and lung squamous cell

carcinoma (LUSC). The total number of THCA

patients is 569, including 510 tumor samples and 59

normal samples. The total number of GBMLGG

patients is 529, including 487 tumor samples and 42

normal samples. The total number of LUSC patients

is 387, including 342 tumor samples and 45 normal

samples. In addition, we preprocessed the dataset by

deleting all genes with zero median in all samples, as

shown in table 1.

Table 1: The number of features in datasets before and after

preprocessing.

Dataset Name

Before

rocessin

After

rocessin

THCA 1046 898

GBMLGG 1046 856

LUSC 1046 886

2.2 Feature Selection Method

2.2.1 Support Vector Machine (SVM)-

Based Feature Selection

Given training sample set D, to classify training set

sample D= {(x

, y

), (x

, y

), ..., (x

, y

)}, y

∈{-1,

+1}, we need to find a partition hyperplane in the

sample space based on the training set D. In the

sample space, the partition hyperplane can be

described by the following linear equation:

x + b = 0 (1)

where ω is a normal vector, which determines the

direction of the hyperplane b is the displacement

term, which determines the distance between the

hyperplane and the origin. The partition hyperplane

can be determined by normal vector w and

displacement b. The distance from any point in the

sample to the hyperplane ( ω, b) can be expressed as:

𝑟=











‖



‖

(2)

If the hyperplane (𝜔, b) can correctly classify the

training samples, for (x

, y

) ∈D, if y

= + 1,ω

+ b

> 0, if y

= − 1, ω

+ b < 0:



𝜔



𝑥



+𝑏≥+1,𝑦



=+1

𝜔



𝑥



+𝑏≤−1,𝑦



=−1

(3)

As shown in the following figure 1, several training

samples points closest to the hyperplane make the

equality of Equation (3) hold, which are called

support vectors. The sum of the distances between

these two heterogeneous support vectors to the

hyperplane can be represented by formula (4), which

is called interval.

𝜸=

𝟐

‖

𝝎

‖

(4)

Figure 1: Support vector and interval.

ICBEB 2022 - The International Conference on Biomedical Engineering and Bioinformatics

596

Only support vectors work when deciding to

separate hyperplane, while other instances do not. If

the mobile support vector will change the solution;

but if you move other instance points outside the

margin, or even remove them, the points will not

change. Since support vector plays a decisive role in

determining hyperplane, this model is called support

vector machine. The number of support vectors is

generally small, so the support vector machine is

determined by a small number of important samples

(Drucker, Burgers, Kaufman, et al 1996).

Finding the appropriate ω and b such that 𝛾 is

the maximum partition hyperplane with the

maximum interval, that is, satisfying:

𝑚𝑖𝑛

‖

𝜔

‖



𝑠.𝑡. 𝑦



(𝜔



𝑥



+𝑏)≥1,𝑖 = 1, 2, …, m (5)

To sum up, there is the following linear separable

support vector machine learning algorithm -

maximum margin method

Algorithm: Linear separable support vector

machine learning algorithm

Input: Linear dataset = D= {(x

, y

), (x

, y

), ..., (x

, y

)}, y

∈{-1, +1}.

Output: Maximum separation hyperplane and

classification decision function.

When we use SVM for feature selection, we use

the weight of SVM classifier to generate feature

ranking. Linear SVM will provide the weight of each

feature after classification as the basis for feature

ranking.

Maximum separation hyperplane of linear

separable training dataset exists and unique.

Proof Existence:

Since the training data set is linearly separable, (5)

in the maximum interval method must have a feasible

solution, and because the objective function has a

lower bound, (5) must have a solution, denoted by (𝜔,

b). Since there are both positive and negative points

in the training data set, (𝜔, b) = (0, b) is not the

optimal feasible solution, so the optimal solution ( 𝜔,

b) must satisfy 𝜔 * / = 0. From this, we can know

the existence of separating hyperplane.

Proof Uniqueness:

When we use SVM for feature selection, we use

the weight of SVM classifier to generate feature

ranking. Linear SVM will provide the weight of each

feature after classification as the basis for feature

ranking

Firstly, the uniqueness of w * in the solution of

optimization problem (5) is proved. Suppose problem

(5) has two optimal solutions ( 𝜔

*, b

*) and ( 𝜔

*, b

*). Obviously | | 𝜔 * 1 | | = | | 𝜔 * 2 | | = c,

where c is a constant. Let 𝜔=





∗





∗



, 𝑏=





∗





∗



,it is

easy to know that ( 𝜔, b) is the feasible solution of

problem (5), so c ≤

‖

𝜔

‖

≤





𝜔



∗+





𝑤



∗

=

𝐶,the above equation indicates that the unequal sign

in the equation can be changed into an equal sign,

That is

‖

𝜔

‖





𝜔



∗

+





𝜔



∗

, Thus 𝜔



∗

=𝜆𝜔



∗

𝜆

=1.if 𝜆 = -1, then 𝜔 = 0, ( 𝜔, b ) is not a

feasible solution to problem ( 5 ). So 𝜆 = 1, that

is𝜔



∗

=𝜔



∗

. Thus, the two optimal solutions ( 𝜔



∗

𝑏



∗

) and ( 𝜔



∗

, 𝑏



∗

) can be written as (𝜔

∗

, 𝑏



∗

) and

(𝜔

∗

, 𝑏



∗

), respectively. It is further proved that 𝑏



∗

𝑏



∗

. Let 𝑥





, 𝑥





and set {



𝑥



𝑦



=+1



} correspond

to the points where ( 𝜔

∗

, 𝑏



∗

) and ( 𝜔

∗

, 𝑏



∗

) make

the inequality of the problem hold, respectively,

corresponding to 𝑥





and 𝑥





,in set



𝑥



𝑦



=−1



,then from 𝑏



∗

=−





(

𝜔

∗

⋅𝑥





+𝜔

∗

⋅𝑥





)

, 𝑏



∗

−





(

𝜔

∗

⋅𝑥





+𝜔

∗

⋅𝑥





)

, 𝑏



∗

− 𝑏



∗

=−







𝑤

∗

⋅

(

𝑥





−

𝑥





)

+𝜔

∗

⋅

(

𝑥





−𝑥





)



is obtained. Because 𝜔

∗

⋅

𝑥





+𝑏



∗

≥1=𝜔

∗

⋅𝑥





+𝑏



∗

, 𝜔

∗

⋅𝑥





+𝑏



∗

≥1=

𝜔

∗

⋅𝑥





+𝑏



∗

,so 𝑤

∗

⋅

(

𝑥





−𝑥





)

=0 is the same as

𝑤

∗

⋅

(

𝑥





−𝑥





)

=0. Therefore, 𝑏



∗

− 𝑏



∗

= 0 can be

seen from 𝜔



∗

= 𝜔



∗

that the two optimal solutions

( 𝜔



∗

, 𝑏



∗

) and ( 𝜔



∗

, 𝑏



∗

) are the same, and the

uniqueness of the solution is proved. From the

uniqueness of solution of formula (5), it is concluded

that the separated hyperplane is unique (C. Platt

1999).

2.2.2 Support Vector Machine-Recursive

Feature Elimination (SVM-RFE)

Firstly, in each round of training process, all features

are selected for training, and then the hyperplane 𝜔

x + b = 0 is obtained. If there are n features, then

SVM-RFE will select the feature corresponding to the

sequence number i with the least square value of the

component in w, and delete it. In the second class, the

number of features remaining n-1, continue to use

these n-1 features and output values to train SVM.

Similarly, continue to remove the features

corresponding to the minimum square value of the

component in w. In this way, until the remaining

number of features meet our requirements.

In order to better evaluate the performance

fluctuation in the feature selection process, it is

necessary to add a layer of resampling process to the

outer layer of the above algorithm. This experiment

uses K-fold cross validation.

The overall process of the algorithm is as follows:

algorith

1.For each resamplin

iteration

1.1 The most important feature variable S {i}

efore extraction

Comparison of SVM-based Feature Selection Method for Biological Omics Dataset

597

1.2 Training model based on new dataset

1.3 Validation Set Assessment Model

1.4 Split the training set into new training set

and verification set

1.5 Training model with new training set and

all characteristic variables

1.6 Evaluation model using validation sets

1.7 Calculate and sort the importance of all

feature variables

1.8 For each variable subset S {i},i = 1... S :

2. Determining an appropriate number of

characteristic variables

3. Estimate the set of characteristic variables

for final model construction

4. Selecting the optimal variable set and

uilding the final model with all training sets

2.3 Evaluation of Feature Selection

Methods

This section must be in two columns.

Each column must be 7,5-centimeter wide with a

column spacing of 0,8-centimeter.

The section text must be set to 10-point, justified

and linespace single.

Section, subsection and sub subsection first

paragraph should not have the first line indent, other

paragraphs should have a first line indent of 0,5-

centimeter.

In order to compare the performance of

SVM and SVM-RFE feature selection methods, the

two methods are used to select the same number of

features on three datasets, and then LR and RF

classifiers and K-fold cross validation are used. F1

and AUC are used as metrics of this method. For

SVM-RFE eliminating the influence of different

number of features each time on the model, we set up

three different iterative elimination features to

compare the performance of the model.

Specially, the linear kernel SVM is used in our

experiment, and the penalty coefficient C in SVM is

set to 1, so that it has good generalization ability. In

the SVM-RFE method, one feature is deleted

recursively each time to maximize the model

performance.

The metrics we use relate to True Positive (TP),

True Negative (TN), False Positive (FP), and False

Negative (FN) are involved. The first metrics is F1,

F1 combines Precision and Recall, and its evaluation

is more balanced.

F1 =

∗∗



(6)

The second evaluation index is roc-auc, roc curve

is the relationship between FPR and TPR. By drawing

the ROC curve, we can observe the performance of

the model. The better the performance of the model,

the closer the ROC curve is to the solid shallow gray

line in the upper left corner of Figure 2.

The x-axis is false positive rate (FPR):

FPR=





(7)

The y-axis is true positive rate (TPR):

TPR=





(8)

AUC is the area covered by the ROC curve.

Obviously, the larger AUC is, the better the classifier

classification effect is.

3 RESULTS

3.1 The Number of Features

Eliminated in Each Iteration

Affects Feature Selection

Performance

For the mutual benefit and protection of Authors and

Publishers, it is necessary that Authors provide

formal written Consent to Publish and Transfer of

Consent ensures that the publisher has the Author’s

authorization to publish the Contribution.

The copyright form is located on the authors’

reserved area.

The form should be completed and signed by one

author on behalf of all the other authors.

To investigate whether the number of features

removed in each iteration affects the performance of

the SVM-RFE feature selection method, we used

SVM-RFE to remove a different number of features

in each iteration on three cancer datasets. Finally, LR

and RF classifiers were used to compare the feature

selection results.

Table 2: Eliminating the performance impact of different

number of features each time.

ICBEB 2022 - The International Conference on Biomedical Engineering and Bioinformatics

598

TABLE 2 shows the performance of SVM-RFE

in deleting the different number of features in each

iteration. In the SVM-RFE feature selection method,

one or more features can be eliminated each time, and

it can be seen from the table that the model performs

best when one feature is eliminated each iteration. We

speculate that when a set of features consisting of

multiple features is removed each time, we take the

overall importance of a set of features as the

evaluation criterion. The deficiency of this is that

although the other set of relatively insignificant

features is removed, the importance of each feature

within the relatively important set of features cannot

be judged. It is possible that there is a group of

features with high overall importance, but some

unimportant features in the group are not removed.

Therefore, eliminating multiple features each time

may cause a certain degree of performance

degradation.

The relationship between the number of features

eliminated per RFE iteration and performance is

shown in the following figure.

Figure 2: On the THCA dataset.

Figure 3: On the GBMGG dataset.

Figure 4: On the LUSC dataset.

Table 3: The influence of eliminating different number of

features each time on feature selection time consumption.

TABLE 3 shows the time cost for SVM-RFE to

delete different number of features each iteration, the

more features are eliminated each iteration, the less

time is spent on feature selection. TABLE4.

Evaluating feature selection methods using LR and

RF models

3.2 Comparison between SVM and

SVM-RFE Feature Selection

Methods

Table 4: Evaluating feature selection methods using LR and

RF models.

Table 4 shows performance on LR and RF classifiers

after selecting 20 features from three TCGA cancer

datasets using SVM and SVM-RFE methods. The

results show that the SVM-RFE feature selection

method achieves better performance than the SVM

Comparison of SVM-based Feature Selection Method for Biological Omics Dataset

599

feature selection method on all these three datasets.

For example, in THCA dataset, SVM-RFE method is

about 0.7 % higher than SVM, while in GBMGG

dataset, SVM-RFE method is about 0.2 % to 2 %

higher than SVM. On LUSC dataset, the F1 score of

SVM-RFE method is slightly lower than SVM only

when using LR classifier, and the other scores are

higher than SVM

4 CONCLUSIONS

In this paper, two feature selection methods based on

SVM are compared, and this method is applied to

three different TCGA cancer datasets to verify and

compare their performance on two classifiers.

Finally, it is concluded that the comprehensive

performance of the SVM-RFE feature selection

method is better than that of the SVM feature

selection method.

In addition, we did a further experiment on the

performance of SVM-RFE, by eliminating a different

number of features to explore the impact of SVM-

RFE each iteration on the model performance. The

conclusion is that when we use SVM-RFE, the model

performs best when one feature is removed in each

iteration, but it takes a long time. Eliminating

multiple features in each iteration improves the time

efficiency of the model, but reduces its performance.

This experiment is of great significance to the study

of cancer, further verifying the feasibility of machine

learning in cancer data analysis, helping doctors and

researchers to reduce the pressure of analyzing cancer

data, and helping predict the patient's condition.

Suggestions for further work: Analyze whether the

patient's condition is serious by judging whether the

patient is in the primary state of cancer or the

metastatic state of cancer lesions. Divide tumors into

types and adopt different treatment options to

improve the patient's 5-year survival rate.

ACKNOWLEDGEMENTS

Throughout the writing of this dissertation, I have

received a great deal of support and assistance. I

would like to thank my parents for their wise counsel

and sympathetic ear. You are always there for me. I

could not have completed this dissertation without the

support of my friends, who provided stimulating

discussions as well as happy distractions to rest my

mind outside of my research.

REFERENCES

Comparison of Penalty-based Feature Selection Approach

on High Throughput Biological Data. N Wang.W

Zhou.J Wu.S Chen.Z Fan(2020)

Comprehensive molecular portraits of human breast

tumours, TCGA Network (2012)

Decision tree based feature selection and multilayer

perceptron forsentiment analysis. J Jotheeswaran.S

Koteeswaran (2015)

Development of Two-Stage SVM-RFE Gene Selection

Strategy for Microarray Expression Data Analysis, Yu

chun Tang, Yan-Qing Zhang, and Zhen Huang (2007).

Feature selection for support vector machines. J Weston, S

Mukherjee, O Chapelle, M Pontil, V Vapnik(2001)

ISABELLE GUYON, JASON WESTON, STEPHEN

BARNHILL Gene Selection for Cancer Classification

using Support Vector Machines, AT&T Labs, Red

Bank, New Jersey, USA, (2002,7-14).

Molecular Classification of Cancer: Class Discovery and

Class Prediction by Gene Expression Monitoring.T. R.

GoLub, 12*t D. K. SLonim,1t P. Tamayo,' C.

Huard,'M. Gaasenbeek,l J. P. Mesirov,1 H. CoUler,1

M. L. Loh,2 J. R. Downing,3 M. A. Caligiuri,4 C. D.

Bloomfield,4 E. S. Lander (1999)

Multiclass SVM-RFE for product form feature selection.

Meng-Dar Shieh *, Chih-Chieh Yang (2002)

Overview of Systems Biology and Omics Technologies.

Bensu Karahalil (2016).

Platt J C. Fast train of support vector machines using

sequential minimal optimization (1999).

Support Vector Machines, Boser, (1992); Vapnik, (1998)

Support vector regression machines. In: Advances in

Neural Information Processing Systems 9, Drucker J,

Burgers C J C, Kaufman L,er al., NIPS 1996. MIT

Press, 155-161

Variable selection using Random Forests. Robin Genuer,

Jean-Michel Poggi, Christine Tuleau-Malot (2016).

WA. Bhola and S. Singh (2018), “Gene selection using high

dimensional gene expression data: An appraisal.

ICBEB 2022 - The International Conference on Biomedical Engineering and Bioinformatics

600