Extracting Patient Data from Tables in Clinical Literature - Case Study on Extraction of BMI, Weight and Number of Patients

Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic

Abstract

Current biomedical text mining efforts are mostly focused on extracting information from the body of research articles. However, tables contain important information such as key characteristics of clinical trials. Here, we examine the feasibility of information extraction from tables. We focus on extracting data about clinical trial participants. We propose a rule-based method that decomposes tables into cell level structures and then extracts information from these structures. Our method performed with a F-measure of 83.3% for extraction of number of patients, 83.7% for extraction of patient’s body mass index and 57.75% for patient’s weight. These results are promising and show that information extraction from tables in biomedical literature is feasible.

References

  1. Chavan, M. M. and Shirgave, S. (2011). A methodology for extracting head contents from meaningful tables in web pages. In Communication Systems and Network Technologies (CSNT), 2011 International Conference on, pages 272-277. IEEE.
  2. Chen, H.-H., Tsai, S.-C., and Tsai, J.-H. (2000). Mining tables from large scale html texts. In Proceedings of the 18th conference on Computational linguistics-Volume 1, pages 166-172. Association for Computational Linguistics.
  3. Divoli, A., Wooldridge, M. A., and Hearst, M. A. (2010). Full text and figure display improves bioscience literature search. PloS one, 5(4):e9619.
  4. Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., and Pollak, B. (2007). Towards domain-independent information extraction from web tables. In Proceedings of the 16th international conference on World Wide Web, pages 71-80. ACM.
  5. Hearst, M. A., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M. A., and Ye, J. (2007). Biotext search engine: beyond abstract search. Bioinformatics, 23(16):2196-2197.
  6. Hunter, L. and Cohen, K. B. (2006). Biomedical language processing: perspective whats beyond pubmed? Molecular cell, 21(5):589.
  7. Hurst, M. F. (2000). The interpretation of tables in texts. PhD thesis, University of Edinburgh.
  8. Kieninger, T. G. and Strieder, B. (1999). T-recs table recognition and validation approach. In AAAI Fall Symposium on Using Layout for the Generation, Understanding and Retrieval of Documents.
  9. Mulwad, V., Finin, T., Syed, Z., and Joshi, A. (2010). Using linked data to interpret tables. COLD, 665.
  10. Ng, H. T., Lim, C. Y., and Koo, J. L. T. (1999). Learning to recognize tables in free text. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 443-450. Association for Computational Linguistics.
  11. Son, J.-W., Lee, J.-A., Park, S.-B., Song, H.-J., Lee, S.- J., and Park, S.-Y. (2008). Discriminating meaningful web tables from decorative tables using a composite kernel. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on, volume 1, pages 368-371. IEEE.
  12. Tengli, A., Yang, Y., and Ma, N. L. (2004). Learning table extraction from examples. In Proceedings of the 20th international conference on Computational Linguistics, page 987. Association for Computational Linguistics.
  13. Wei, X., Croft, B., and McCallum, A. (2006). Table extraction for answer retrieval. Information retrieval, 9(5):589-611.
  14. Wong, W., Martinez, D., and Cavedon, L. (2009). Extraction of named entities from tables in gene mutation literature. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 46-54. Association for Computational Linguistics.
  15. Yildiz, B., Kaiser, K., and Miksch, S. (2005). pdf2table: A method to extract table information from pdf files. In IICAI, pages 1773-1785.
Download


Paper Citation


in Harvard Style

Milosevic N., Gregson C., Hernandez R. and Nenadic G. (2016). Extracting Patient Data from Tables in Clinical Literature - Case Study on Extraction of BMI, Weight and Number of Patients . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 223-228. DOI: 10.5220/0005660102230228


in Bibtex Style

@conference{healthinf16,
author={Nikola Milosevic and Cassie Gregson and Robert Hernandez and Goran Nenadic},
title={Extracting Patient Data from Tables in Clinical Literature - Case Study on Extraction of BMI, Weight and Number of Patients},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)},
year={2016},
pages={223-228},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005660102230228},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)
TI - Extracting Patient Data from Tables in Clinical Literature - Case Study on Extraction of BMI, Weight and Number of Patients
SN - 978-989-758-170-0
AU - Milosevic N.
AU - Gregson C.
AU - Hernandez R.
AU - Nenadic G.
PY - 2016
SP - 223
EP - 228
DO - 10.5220/0005660102230228