Authors:
Boshu Ru
1
;
Charles Warner-Hillard
1
;
Yong Ge
1
and
Lixia Yao
2
Affiliations:
1
University of North Carolina at Charlotte, United States
;
2
Mayo Clinic and University of North Carolina at Charlotte, United States
Keyword(s):
Social Media, Drug Repositioning, Machine Learning, Patient-Reported Outcomes.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Health Information Systems
;
Pattern Recognition and Machine Learning
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Drug repositioning reduces safety risk and development cost, compared to developing new drugs.
Computational approaches have examined biological, chemical, literature, and electronic health record data
for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the
feasibility of mining a new data source – the fast-growing online patient forum data for identifying and
verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews
from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as
known drug indications by WebMD and thus were defined as serendipitous drug usages. We then
constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random
forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved
an AUC score of 0.937 on the independent test dataset, with precision equ
al to 0.811 and recall equal to
0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity,
tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning
methods make this new data source feasible for studying drug repositioning. Our future efforts include
constructing more informative features, developing more effective methods to handle imbalance data, and
verifying prediction results using other existing methods.
(More)