Identifying Serendipitous Drug Usages in Patient Forum Data - A Feasibility Study

Boshu Ru, Charles Warner-Hillard, Yong Ge, Lixia Yao


Drug repositioning reduces safety risk and development cost, compared to developing new drugs. Computational approaches have examined biological, chemical, literature, and electronic health record data for systematic drug repositioning. In this work, we built an entire computational pipeline to investigate the feasibility of mining a new data source – the fast-growing online patient forum data for identifying and verifying drug-repositioning hypotheses. We curated a gold-standard dataset based on filtered drug reviews from WebMD. Among 15,714 sentences, 447 mentioned novel desirable drug usages that were not listed as known drug indications by WebMD and thus were defined as serendipitous drug usages. We then constructed 347 features using text-mining methods and drug knowledge. Finally we built SVM, random forest and AdaBoost.M1 classifiers and evaluated their classification performance. Our best model achieved an AUC score of 0.937 on the independent test dataset, with precision equal to 0.811 and recall equal to 0.476. It successfully predicted serendipitous drug usages, including metformin and bupropion for obesity, tramadol for depression and ondansetron for irritable bowel syndrome with diarrhea. Machine learning methods make this new data source feasible for studying drug repositioning. Our future efforts include constructing more informative features, developing more effective methods to handle imbalance data, and verifying prediction results using other existing methods.


  1. Ali, S. & Smith-Miles, K. A. Improved support vector machine generalization using normalized input space. In: Proceedings of the 19th Australasian Joint Conference on Artificial Intelligence, 2006 Hobart, Australia. Springer, 362-371.
  2. Anderson, J. W., Greenway, F. L., Fujioka, K., Gadde, K. M., Mckenney, J. & O'neil, P. M. 2002. Bupropion SR Enhances Weight Loss: A 48-Week Double-Blind, Placebo-Controlled Trial. Obesity Research, 10, 633- 641.
  3. Andronis, C., Sharma, A., Virvilis, V., Deftereos, S. & Persidis, A. 2011. Literature mining, ontologies and information visualization for drug repurposing. Briefings in Bioinformatics, 12, 357-368.
  4. Aronson, A. R. & Lang, F.-M. 2010. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17, 229-236.
  5. Ashburn, T. T. & Thor, K. B. 2004. Drug repositioning: identifying and developing new uses for existing drugs. Nature Review Drug Discovery, 3, 673-683.
  6. Batuwita, R. & Palade, V. Efficient resampling methods for training support vector machines with imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2010 Barcelona, Spain. IEEE, 1-8.
  7. Breiman, L. 2001. Random forests. Machine learning, 45, 5-32.
  8. Caruana, R. & Niculescu-Mizil, A. Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004 Seattle, WA, USA. ACM, 69-78.
  9. Chang, C. & Lin, C. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, 27.
  10. Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. 2002. SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
  11. Cortes, C. & Vapnik, V. 1995. Support-vector networks. Machine Learning, 20, 273-297.
  12. Desilets, A. R., Dhakal-Karki, S. & Dunican, K. C. 2008. Role of metformin for weight management in patients without type 2 diabetes. Annals of Pharmacotherapy, 42, 817-826.
  13. Dudley, J. T., Deshpande, T. & Butte, A. J. 2011. Exploiting drug-disease relationships for computational drug repositioning. Briefings in Bioinformatics, 12, 303-311.
  14. Feinerer, I. & Hornik, K. 2012. tm: text mining package. R package version 0.5-7.1.
  15. Frantz, S. 2005. Drug discovery: playing dirty. Nature, 437, 942-943.
  16. Freund, Y. & Schapire, R. E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, 1996 Bari, Italy. 148-156.
  17. Fürnkranz, J. 1998. A study using n-gram features for text categorization. Austrian Research Institute for Artifical Intelligence, 3, 1-10.
  18. Gadde, K. M., Parker, C. B., Maner, L. G., Wagner, H. R., Logue, E. J., Drezner, M. K. & Krishnan, K. R. R. 2001. Bupropion for weight loss: an investigation of efficacy and tolerability in overweight and obese women. Obesity Research, 9, 544-551.
  19. Garsed, K., Chernova, J., Hastings, M., Lam, C., Marciani, L., Singh, G., Henry, A., Hall, I., Whorwell, P. & Spiller, R. 2014. A randomised trial of ondansetron for the treatment of irritable bowel syndrome with diarrhoea. Gut, 63, 1617-1625.
  20. Gottlieb, A., Stein, G. Y., Ruppin, E. & Sharan, R. 2011. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology, 7, 496.
  21. Greenway, F. L., Fujioka, K., Plodkowski, R. A., Mudaliar, S., Guttadauria, M., Erickson, J., Kim, D. D., Dunayevich, E. & Group, C.-I. S. 2010. Effect of naltrexone plus bupropion on weight loss in overweight and obese adults (COR-I): a multicentre, randomised, double-blind, placebo-controlled, phase 3 trial. The Lancet, 376, 595-605.
  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. & Witten, I. H. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11, 10-18.
  23. He, H. & Garcia, E. A. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263-1284.
  24. Hu, G. & Agarwal, P. 2009. Human disease-drug network based on genomic expression profiles. PLoS ONE, 4, e6536.
  25. Igel, L. I., Sinha, A., Saunders, K. H., Apovian, C. M., Vojta, D. & Aronne, L. J. 2016. Metformin: an old therapy that deserves a new indication for the treatment of obesity. Current Atherosclerosis Reports, 18, 1-8.
  26. Jain, A. K., Kaplan, R. A., Gadde, K. M., Wadden, T. A., Allison, D. B., Brewer, E. R., Leadbetter, R. A., Richard, N., Haight, B. & Jamerson, B. D. 2002. Bupropion SR vs. placebo for weight loss in obese patients with depressive symptoms. Obesity Research, 10, 1049-1056.
  27. Jensen, P. B., Jensen, L. J. & Brunak, S. 2012. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13, 395-405.
  28. Keiser, M. J., Setola, V., Irwin, J. J., Laggner, C., Abbas, A., Hufeisen, S. J., Jensen, N. H., Kuijer, M. B., Matos, R. C., Tran, T. B., Whaley, R., Glennon, R. A., Hert, J., Thomas, K. L. H., Edwards, D. D., Shoichet, B. K. & Roth, B. L. 2009. Predicting new molecular targets for known drugs. Nature, 462, 175-181.
  29. Khatri, P., Roedder, S., Kimura, N., De Vusser, K., Morgan, A. A., Gong, Y., Fischbein, M. P., Robbins, R. C., Naesens, M., Butte, A. J. & Sarwal, M. M. 2013. A common rejection module (CRM) for acute rejection across multiple organs identifies novel therapeutics for organ transplantation. The Journal of Experimental Medicine, 210, 2205-2221.
  30. Knezevic, M. Z., Bivolarevic, I. C., Peric, T. S. & Jankovic, S. M. 2011. Using Facebook to increase spontaneous reporting of adverse drug reactions. Drug Safety, 34, 351-352.
  31. Kotsiantis, S., Kanellopoulos, D. & Pintelas, P. 2006. Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30, 25-36.
  32. Leaman, R., Dogan, R. I. & Lu, Z. 2013. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics, 29, 2909-2917.
  33. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S. & Mcclosky, D. The Stanford CoreNLP natural language processing toolkit. In: The 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014 Baltimore, MD, USA. 55-60.
  34. Mcdonagh, M. S., Selph, S., Ozpinar, A. & Foley, C. 2014. Systematic review of the benefits and risks of metformin in treating obesity in children aged 18 years and younger. JAMA Pediatrics, 168, 178-184.
  35. Michalski, R. S., Carbonell, J. G. & Mitchell, T. M. 2013. Machine learning: An artificial intelligence approach, Springer Science & Business Media.
  36. Paolisso, G., Amato, L., Eccellente, R., Gambardella, A., Tagliamonte, M. R., Varricchio, G., Carella, C., Giugliano, D. & D'onofrio, F. 1998. Effect of metformin on food intake in obese subjects. European Journal of Clinical Investigation, 28, 441-446.
  37. Pedersen, T., Pakhomov, S. V. S., Patwardhan, S. & Chute, C. G. 2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40, 288-299.
  38. Peirson, L., Douketis, J., Ciliska, D., Fitzpatrick-Lewis, D., Ali, M. U. & Raina, P. 2014. Treatment for overweight and obesity in adult populations: a systematic review and meta-analysis. CMAJ Open, 2, E306-E317.
  39. Pleyer, L. & Greil, R. 2015. Digging deep into “dirty” drugs-modulation of the methylation machinery. Drug Metabolism Reviews, 47, 252-279.
  40. Powell, G. E., Seifert, H. A., Reblin, T., Burstein, P. J., Blowers, J., Menius, J. A., Painter, J. L., Thomas, M., Pierce, C. E., Rodriguez, H. W., Brownstein, J. S., Freifeld, C. C., Bell, H. G. & Dasgupta, N. 2016. Social media listening for routine post-marketing safety surveillance. Drug Safety, 39, 443-454.
  41. Quinlan, J. R. 2014. C4.5: programs for machine learning, Elsevier.
  42. Ru, B., Harris, K. & Yao, L. A Content Analysis of Patient-Reported Medication Outcomes on Social Media. In: Proceedings of IEEE 15th International Conference on Data Mining Workshops, 2015 Atlantic City, NJ, USA. IEEE, 472-479.
  43. Sánchez, D., Batet, M., Isern, D. & Valls, A. 2012. Ontology-based semantic similarity: A new featurebased approach. Expert Systems with Applications, 39, 7718-7728.
  44. Sanseau, P., Agarwal, P., Barnes, M. R., Pastinen, T., Richards, J. B., Cardon, L. R. & Mooser, V. 2012. Use of genome-wide association studies for drug repositioning. Nature Biotechnology, 30, 317-320.
  45. Shah, N. H. & Musen, M. A. UMLS-Query: a perl module for querying the UMLS. In: AMIA Annual Symposium, 2008 Washington, DC, USA. 652-656.
  46. Shandrow, K. L. 2016. The Hard Truth: What Viagra Was Really Intended For [Online]. Available: [Accessed 02/22/2016].
  47. Shim, J. S. & Liu, J. O. 2014. Recent advances in drug repositioning for the discovery of new anticancer drugs. International Journal of Biological Sciences, 10, 654-63.
  48. Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y. & Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013 Seattle, WA, USA. Citeseer, 1631-1642.
  49. Stoll, A. L. & Rueter, S. 1999. Treatment augmentation with opiates in severe and refractory major depression. American Journal of Psychiatry, 156, 2017.
  50. Tenore, P. L. 2008. Psychotherapeutic benefits of opioid agonist therapy. Journal of Addictive Diseases, 27, 49- 65.
  51. Tetsunaga, T., Tetsunaga, T., Tanaka, M. & Ozaki, T. 2015. Efficacy of tramadol-acetaminophen tablets in low back pain patients with depression. Journal of Orthopaedic Science, 20, 281-286.
  52. U.S. National Library of Medicine. 2016a. MEDLINE Fact Sheet [Online]. Available: [Accessed 09/29/2016].
  53. U.S. National Library of Medicine. 2016b. SNOMED CT [Online]. Available: d_main.html [Accessed 08/03/2015].
  54. Wren, J. D., Bekeredjian, R., Stewart, J. A., Shohet, R. V. & Garner, H. R. 2004. Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics, 20, 389-398.
  55. Wu, T.-F., Lin, C.-J. & Weng, R. C. 2004. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5, 975-1005.
  56. Xu, H., Aldrich, M. C., Chen, Q., Liu, H., Peterson, N. B., Dai, Q., Levy, M., Shah, A., Han, X., Ruan, X., Jiang, M., Li, Y., Julien, J. S., Warner, J., Friedman, C., Roden, D. M. & Denny, J. C. 2014. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. Journal of the American Medical Informatics Association, 22, 179-191.
  57. Yang, C. C., Yang, H., Jiang, L. & Zhang, M. Social media mining for drug safety signal detection. In: Proceedings of the 2012 International Workshop on Smart Health and Wellbeing, 2012 Maui, HI, USA. ACM, 33-40.
  58. Yao, L. & Rzhetsky, A. 2008. Quantitative systems-level determinants of human genes targeted by successful drugs. Genome Research, 18, 206-213.
  59. Yao, L., Zhang, Y., Li, Y., Sanseau, P. & Agarwal, P. 2011. Electronic health records: Implications for drug discovery. Drug Discovery Today, 16, 594-599.
  60. Yates, A. & Goharian, N. ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In: The 35th European Conference on Information Retrieval, 2013 Moscow, Russia. Springer-Verlag, 816-819.

Paper Citation

in Harvard Style

Ru B., Warner-Hillard C., Ge Y. and Yao L. (2017). Identifying Serendipitous Drug Usages in Patient Forum Data - A Feasibility Study . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017) ISBN 978-989-758-213-4, pages 106-118. DOI: 10.5220/0006145201060118

in Bibtex Style

author={Boshu Ru and Charles Warner-Hillard and Yong Ge and Lixia Yao},
title={Identifying Serendipitous Drug Usages in Patient Forum Data - A Feasibility Study},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017)},

in EndNote Style

JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2017)
TI - Identifying Serendipitous Drug Usages in Patient Forum Data - A Feasibility Study
SN - 978-989-758-213-4
AU - Ru B.
AU - Warner-Hillard C.
AU - Ge Y.
AU - Yao L.
PY - 2017
SP - 106
EP - 118
DO - 10.5220/0006145201060118