TEXT SEGMENTATION USING NAMED ENTITY RECOGNITION AND CO-REFERENCE RESOLUTION

Pavlina Fragkou

doi:10.5220/0003181603490354

TEXT SEGMENTATION USING NAMED ENTITY RECOGNITION AND CO-REFERENCE RESOLUTION

Pavlina Fragkou

2011

Abstract

In this paper we examine the benefit of performing named entity recognition and co-reference resolution to a benchmark used for text segmentation. The aim here is to examine whether the incorporation of such information enhances the performance of text segmentation algorithms. The evaluation using three well known text segmentation algorithms leads to the conclusion that, the benefit highly depends on the segment's topic, the number of named entity instances appearing in it, as well as the segment's length.

References

Beeferman, D., Berger, A. and Lafferty, J. (1999). Statistical models for text segmentation. Machine Learning, 34:177-210.
Bestgen, Y. (2006). Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings Deterministic and Moore (2001). Computational Linguistics, 1:5-12.
Choi, F.Y.Y. (2000). Advances in domain independent linear text segmentation. In Proc. of the 1st Meeting of the North American Chapter of the ACL, pages 26-33.
Choi, F.Y.Y., Wiemer-Hastings, P. & Moore, J. (2001). Latent semantic analysis for text segmentation. In Proceedings of the 6th Conf. on EMNLP, pages 109 - 117.
Corcho O. (2006). Ontology based document annotation: trends and open research problems. Int. J. Metadata, Semantics and Ontologies, 1(1):47-57.
Fragkou, P., Petridis, V. and Kehagias, A. (2007). Segmentation of Greek Text by Dynamic Programming. In Proc. of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), vol. 2, pages 370-373.
Fragkou, P. (2009). A comparison of Information Extraction and Text Segmentation for Web Content Mining. In 4th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2009), pages 482-486.
Hearst, M. A. (1997). TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages, Computational Linguistics, 23(1):33-64.
Heinonen, O. (1998). Optimal Multi-Paragraph Text Segmentation by Dynamic Programming. In Proc. of 17th COLING -ACL'98, pages 1484-1486.
Kehagias, Ath., Nicolaou A., Fragkou P. and Petridis V. (2004). Text Segmentation by Product Partition Models and Dynamic Programming. Mathematical & Computer Modelling, 39:209-217.
Kern, R. and Granitzer, M. (2009). Efficient linear text segmentation based on information retrieval techniques. In Proceedings of the International Conference on Management of Emergent Digital EcoSystems.
Müller, C. and Strube, M. (2006). Multi-Level Annotation of Linguistic Data with MMAX2. In S. Braun, K. Kohn and J. Mukherjee (Eds.): Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods. English Corpus Linguistics, 3: 197-214.
Ogren, P. V. (2006). Knowtator: A Protégé plug-in for annotated corpus construction. Human Language Technology Conference Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 273-275.
Porter, M.F. (1980). An algorithm for suffix stripping Program, 14(3): 130-137.
Petasis, G., Karkaletsis, V., Paliouras, G., Spyropoulos, C. D. (2003). Using the Ellogon Natural Language Engineering Infrastructure. In Proceedings of the Workshop on Balkan Language Resources and Tools, 1st Balkan Conference in Informatics (BCI 2003).
Pevzner, L. and Hearst, M. (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19-36.
Ponte, J. M. and Croft, W. B. (1997). Text segmentation by topic. In Proc. of the 1st Europ. Conf. on Research and Advanced Technology for Digital Libraries, pages 120 - 129.
Qi S., Runxin L., Dingsheng L. and Xihong W. (2008). Text segmentation with LDA-based Fisher kernel. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, pages 269-272.
Reynar, J.C. (1994). An automatic method of finding topic boundaries. In Proc. of the 32nd Annual Meeting of the ACl, pages 331-333.
Sitbon, L. and Bellot, P. (2005). Segmentation thématique par chaînes lexicales pondérées. In Proc the 12th Conference on Natural Language Processing (TALN 2005).
Utiyama, M. and Isahara, H. (2001). A statistical model for domain independent text segmentation. In Proc. of the 9th EACL, pages 491-498.
Xiang J. and Hongyuan Z. (2003). Domain-independent Text Segmentation Using Anisotropic Diffusion and Dynamic Programming. In Proc. of the 26th ACM SIGIR Conf.
Yaari, Y. (1999). Intelligent exploration of expository texts. Ph.D. thesis. Bar-Ilan University.
Ye, N., Zhu, J., Luo, H.,Wang, H. and Zhang, B. (2005). Improvement of the dotplotting method for linear text segmentation. In Proc of Natural Language Processing and Knowledge Engineering, pages 636- 641.

Download

Paper Citation

in Harvard Style

Fragkou P. (2011). TEXT SEGMENTATION USING NAMED ENTITY RECOGNITION AND CO-REFERENCE RESOLUTION . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 349-354. DOI: 10.5220/0003181603490354

in Bibtex Style

@conference{icaart11,
author={Pavlina Fragkou},
title={TEXT SEGMENTATION USING NAMED ENTITY RECOGNITION AND CO-REFERENCE RESOLUTION},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={349-354},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003181603490354},
isbn={978-989-8425-40-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - TEXT SEGMENTATION USING NAMED ENTITY RECOGNITION AND CO-REFERENCE RESOLUTION
SN - 978-989-8425-40-9
AU - Fragkou P.
PY - 2011
SP - 349
EP - 354
DO - 10.5220/0003181603490354