In this work, we have proposed a novel prognosis 
prediction method based on SVM to create 
personalized predictive models. Two datasets , 
NSCLC and WPBC, was selected which had small 
size,  high dimensional  characteristics. 
The novelty of this work is three-fold. Firstly, we 
have modified the standard RBF kernel function in 
SVM model to fit the test data, which have hybrid 
types of feature. This modification makes the model 
meet the needs of practical application. Secondly, we 
propose the SMOTE strategy to deal with 
imbalanced training-data problems. A series of 
experiments have demonstrated the effectiveness of 
SMOTE strategy when faced with imbalanced data 
set. Thirdly,  SVM-RFE is employed to extract 
features collection of greatest impact on outcome. 
The results demonstrate that with the help of  SVM-
RFE, 17 out of 34 attributes of WBPC have been 
selected, and 28 out of 37 attributes of NSCLC  have 
been selected which outperforms the over all attribute 
collection. 
 So far, only SVM models have been employed. 
In the future, we are preparing an extensive set of 
tests by using other machine learning method, such 
as random forest, deep learning, in the same manner 
as the SVM procedure. 
ACKNOWLEDGMENT  
This work has been partially supported by the 
National Natural Science Foundation of China(No 
31371340) and the Natural Science Foundation of 
Education Department of Anhui Province(No. 
KJ2017A542) 
REFERENCES 
 
Z. H. Zhu,  B. Y. Sun, Y. Ma,  J. Y. Shao, H. Long, X. 
Zhang, ... & P. Ling, “Three immunomarker support 
vector machines–based prognostic classifiers for stage 
IB non–small-cell lung cancer,” Journal  of  clinical 
oncology, Vol.27, pp.1091-1099, 2009. 
K. Jayasurya, G. Fung, S. Yu, C. Dehing-Oberije, D. De 
Ruysscher, A. Hope, ... & A. L. A. J. Dekker, 
“Comparison of Bayesian network and support vector 
machine models for two-year survival prediction in 
lung cancer patients treated with radiotherapy,” 
Medical physics, Vol.37, pp.1401-1407, 2010. 
G. Wu, E.Y. Chang, “Class-boundary alignment for 
imbalanced dataset learning,” In ICML 2003 
workshop on learning from imbalanced data sets II, 
Washington, DC, pp. 49-56, August 2003. 
W. N. Street, O. L. Mangasarian, W. H. Wolberg, “An 
inductive learning approach to prognostic prediction,” 
In ICML, Tahoe City, California, USA, pp.522-530, 
July 1995. 
D. Chakraborty, U. Maulik, “Identifying cancer 
biomarkers from microarray data using feature 
selection and semisupervised learning,” IEEE journal 
of translational engineering in health and 
medicine, Vol.2, pp.1-11, 2014.  
 X. Xu, Y. Zhang,  L. Zou,  M. Wang, ... & A. Li, “A gene 
signature for breast cancer prognosis using support 
vector machine,” In BMEI,  Chongqing, China, pp. 
928-931, October 2012  
A. Rosenwald, G. Wright, W. C. Chan, “The use of 
molecular profiling to predict survival after 
chemotherapy for diffuse large-B-cell lymphoma,” 
New England Journal of Medicine, Vol.346,  pp.1937-
1947, 2002. 
K. B. Duan, J. C. Rajapakse, H. Wang, et al. “Multiple 
SVM-RFE for gene selection in cancer classification 
with expression data,” NanoBioscience, IEEE 
Transactions on, Vol.4, pp.228-234, 2005. 
Xu T, Le T D, Liu L, et al. “Identifying cancer subtypes 
from mirna-tf-mrna regulatory networks and 
expression data,” PloS one, Vol.11(4): e0152792, 
2016.