
Table 3: Bias-corrected predictive performance for five different models. 
MI Final Model*  MI Clinical Model*  C-C Final Model  C-C Clinical Model  C-C Split Models* 
C-index  0.744 [0.742-0.744]  0.734 [0.733-0.738]  0.747  0.731  0.728 [0.683-0.753] 
AUC  0.748 [0.747-0.749]  0.738 [0.736-0.741]  0.749  0.732  0.732 [0.678-0.765] 
Slope  0.961 [0.954-0.966]  0.981 [0.976-0.992]  0.949  0.975  0.956 [0.67-1.209] 
NRI 14.6% [10%-17.9%]  /  2.47%  /  / 
* Data are expressed as median [full range]. 
4 DISCUSSION 
In this study, we have given an example to illustrate 
the process of prediction model development based 
on incomplete data. To get a more stable risk factor 
set from clinical and genetic variable list for CHD in 
T2DM, we integrated bootstrap and backward varia-
ble selection on imputed data sets.  
Incomplete data are commonly encountered in 
medical research. Excluding all patients with any 
missing values may lose useful information and re-
duce the power of prediction model, which leads to 
some variables not attaining statistical significance, 
such as for the systolic BP and rs4607106 in our MI 
Final Model and C-C Final Model. In our study, the 
MI models are very similar to the C-C models, it is 
because the missing rates are not high and the sam-
ple sizes are close, but imputation makes it more 
powerful to perform variable selection. 
Combining bootstrap resampling with variable 
selection will be benefit to the stability of selected 
variables. Through bootstrap and variable selection, 
variables with strong effects on the outcome will be 
selected more frequently than those with no or weak 
effects. To validate a model, data-splitting as a sim-
ple method is commonly used, but the model per-
formance will vary greatly with different splits, and 
bias will be introduced. Our results showed the boot-
strapping bias-corrected indicators of performance 
were close to the median indicators produced by 
multiple times training/test splits. Therefore, to en-
sure an honest model evaluation, we would better 
evaluate the models by generating multiple pairs of 
training/test sets or use bias-corrected method. 
Importantly, three SNPs (rs2568958, rs7754840 
and rs4607103 located at NEGR1, CDKAL1 and 
ADAMTS9 gene, respectively) were selected with 
high inclusion frequencies and the NRI results indi-
cated they contributed to the CHD prediction. There-
fore, these three T2D-related SNPs may also have 
association effects with CHD. To validate the effect 
of these SNPs, we will try to do some further anal-
yses, such as replication study. 
In conclusion, this cohort study illustrated the 
MICE and bootstrap can be benefit to the develop-
ment of prediction model based on dataset contain-
ing clinical and genetic variables. An informative 
risk factor set for CHD, including three T2D-related 
SNPs, was successfully identified from CHD pro-
spective cohort of Hong Kong Chinese patients with 
T2DM. Future research will be needed to validate 
the effect of these selected SNPs. 
ACKNOWLEDGEMENTS 
This work was supported by the Innovation and 
Technology Fund (ITS/487/09FP), RGC Central 
Allocation Scheme (CUHK 1/04C), RGC Ear-
marked Research Grant (CUHK4724/07M), and the 
CUHK Direct Grant (2150476 and 2141611). 
REFERENCES 
Harrel Jr, F. E. a. L., K. L. and Mark, D. B. 1996. Tutorial 
in biostatistics: multivariable prognostic models: 
issues in developing models, evaluating assumptions 
and adequacy, and measuring and reducing error. 
Statistics in Medicine, 361–387. 
Laakso, M. 2001. Cardiovascular disease in type 2 
diabetes: challenge for treatment and prevention. J 
Intern Med, 249, 225-35. 
Vaarhorst, A. A., Lu, Y., Heijmans, B. T., Dolle, M. E., 
Bohringer, S., Putter, H., Imholz, S., Merry, A. H., van 
Greevenbroek, M. M., Jukema, J. W., Gorgels, A. P., 
van den Brandt, P. A., Muller, M., Schouten, L. J., 
Feskens, E. J., Boer, J. M. & Slagboom, P. E. 2012. 
Literature-based genetic risk scores for coronary heart 
disease: the Cardiovascular Registry Maastricht 
(CAREMA) prospective cohort study. Circ 
Cardiovasc Genet, 5, 202-9. 
van Buuren, S., Boshuizen, H. C. & Knook, D. L. 1999. 
Multiple imputation of missing blood pressure 
covariates in survival analysis. Stat Med, 18, 681-94. 
Yang, X., So, W. Y., Kong, A. P., Ma, R. C., Ko, G. T., 
Ho, C. S., Lam, C. W., Cockram, C. S., Chan, J. C. & 
Tong, P. C. 2008. Development and validation of a 
total coronary heart disease risk score in type 2 
diabetes mellitus. Am J Cardiol, 101, 596-601. 
DevelopmentofPredictionModelsunderMultipleImputationforCoronaryHeartDiseaseinType2DiabetesMellitus
315