CONDITIONAL RANDOM FIELDS FOR TERM EXTRACTION

Xing Zhang, Yan Song, Alex Chengyu Fang

2010

Abstract

In this paper, we describe how to construct a machine learning framework that utilizes syntactic information in extraction of biomedical terms. Conditional random fields (CRF), is used as the basis of this framework. We make an effort to find the appropriate use for syntactic information, including parent nodes, syntactic paths and term ratios under the machine learning framework. The experiment results show that syntactic paths and term ratios can improve precision of term extraction, including old terms and novel terms. However, the recall rate of novel terms still needs to be increased. This research serves as an example for constructing machine learning based term extraction systems that utilizes linguistic information.

References

  1. Justeson, J. S., and Katz, S.M. (1995). Technical Terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1):9--27.
  2. Fang, A. C. (1996). The Survey Parser: Design and Development. In S. Greenbaum (Ed.), Comparing English World Wide: The International Corpus of English (pp. 142-160). Oxford: Oxford University Press.
  3. Lafferty, J. D., McCallum, A. and Pereira, F. C. N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML 7801: Proceedings of the Eighteenth International Conference on Machine Learning, pp 282-289, San Francisco, CA, USA, 2001.
  4. National Library of Medicine. Fact sheet: medical subject headings (MeSH). [Web document]. Bethesda, MD: National Institutes of Health. 2010. [Last updated: 01 April 2010; cited 9 April 200]. <www.nlm.nih.gov/pubs/factsheets/mesh.html>.
  5. Song Y., Kit, C. Y., Xu, R. F., Zhao, H. How Unsupervised Learning Affects Character Tagging based Chinese Word Segmentation: A Quantitative Investigation, in Proceedings of International Conference on Machine Learning and Cybernetics, Jul, 2009.
  6. Takeuchi, K., and Collier, N. (2004). Bio-medical entity extraction using support vector machines, Artificial Intelligence in Medicine, Volume 33, Issue 2, Pages 125-137.
  7. Tsai, T. H., Chou, W. C. and Wu, S. H.(2005). Integrating linguistic knowledge into a conditional random field framework to identify biomedical named entities. Expert Systems Appl. v30 i1. 117-128.
  8. Zheng, D, Zhao, T. and Yang, J. (2009). Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, 22nd International Conference, ICCPOL 2009, Hong Kong, March 26-27. Proceedings 2009.
  9. Zhao H., Huang C. N., and Li M. An Improved Chinese Word Segmentation System with Conditional Random Field, Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing (SIGHAN-5), pp.162-165, Sydney, Australia, July 22-23, 2006.
  10. Zhang, X. and Fang. A. C. (2010). An ATE System based on Probabilistic Relations between Terms and Syntactic Functions. In 10th International Conference on Statistical Analysis of Textual Data. Sapienza, University of Rome (Italy), 9 to 11 June 2010.
Download


Paper Citation


in Harvard Style

Zhang X., Song Y. and Fang A. (2010). CONDITIONAL RANDOM FIELDS FOR TERM EXTRACTION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 414-417. DOI: 10.5220/0003077304140417


in Bibtex Style

@conference{kdir10,
author={Xing Zhang and Yan Song and Alex Chengyu Fang},
title={CONDITIONAL RANDOM FIELDS FOR TERM EXTRACTION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={414-417},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003077304140417},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - CONDITIONAL RANDOM FIELDS FOR TERM EXTRACTION
SN - 978-989-8425-28-7
AU - Zhang X.
AU - Song Y.
AU - Fang A.
PY - 2010
SP - 414
EP - 417
DO - 10.5220/0003077304140417