Authors:
Tian Yun
1
;
Deepti Garg
2
and
Natalia Khuri
1
Affiliations:
1
Department of Computer Science, Wake Forest University, 1834 Wake Forest Road, Winston-Salem, U.S.A.
;
2
Department of Computer Science, San José State University, One Washington Square, San José, U.S.A.
Keyword(s):
Text Mining, Classification, Machine Learning, Support Vector Machine.
Abstract:
To perform a comprehensive and detailed analysis of the gaps in knowledge about drugs’ safety and effectiveness in neonates, infants, children, and adolescents, large collections of complex and unstructured texts need to be analyzed. In this work, machine learning algorithms have been used to implement classifiers of biomedical texts and to extract information about safety and efficacy of drugs in pediatric populations. Models were trained using approved drug product labels and computational experiments were conducted to evaluate the accuracy of the models. A Support Vector Machine with a radial kernel had the best performance by classifying short texts with an accuracy of 94% and an excellent precision. Results show that classifiers perform better when trained using features comprising multiple words rather than single words. The proposed text classifier may be used to mine other sources of biomedical information, such as research publications and electronic health records.