A Tool to Evaluate Error Correction Resources and Processes Suited for Documents Improvement

Arnaud Renard, Sylvie Calabretto, Béatrice Rumpler

2012

Abstract

In this article we present a solution to overcome the difficulties in the comparative evaluation of error corrections systems and mechanisms. An overview of existing error correction approaches allowed us to notice that most of them introduce their own evaluation process with the drawbacks it represents: i.e. it is not clear if one approach is better suited than another to correct a specific type of error. Obviously each evaluation process in itself is not completely original and consequently some similarities can be observed. In this context, we rely on this fact to propose a generalist ``evaluation design pattern'' we fitted to the case of error correction in textual documents. The idea lying beyond that is to provide a standard way to integrate required resources according to the family (previously defined in the evaluation model) they belong to. Moreover, we developed a platform which relies on OSGi specifications to provide a framework supporting the proposed evaluation model.

References

  1. Atkinson, K. (2012). Aspell Spellchecker. http://aspell.net. Last access 15 Jan. 2012.
  2. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, mit press edition.
  3. Hirst, G. and Budanitsky, A. (2005). Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering, 11(1):87-111.
  4. Hirst, G. and St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In Fellbaum, C., editor, WordNet An Electronic Lexical Database, volume 305, chapter 13, pages 305-332. The MIT Press.
  5. Kantor, P. B. and Voorhees, E. M. (2000). The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text. Information Retrieval, 2(2):165-176.
  6. Kukich, K. (1992). Techniques for Automatically Correcting Words in Text. ACM Computing Surveys (CSUR), 24(4):439.
  7. Mays, E., Damerau, F. J., and Mercer, R. L. (1991). Context based spelling correction. Information Processing & Management, 27(5):517-522.
  8. Miller, G. A. (1995). WORDNET: a Lexical Database for English. Communications of the ACM, 38(11):39-41.
  9. Mitton, R. (2008). Ordering the suggestions of a spellchecker without using context. Natural Language Engineering, 15(02):173-192.
  10. Mudge, R. (2012). After the Deadline. http:// static.afterthedeadline.com. Last access 15 Jan. 2012.
  11. OSGi-Alliance (2012). Open Services Gateway initiative. http://www.osgi.org. Last access 15 Jan. 2012.
  12. Pedler, J. (2007). Computer Correction of Real-word Spelling Errors in Dyslexic Text. PhD thesis, Birkbeck, London University.
  13. Rosnay, J. and Revelli, C. (2006). Pronetarian Revolution.
  14. Ruch, P. (2002). Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In Proceedings of the 19th international conference on Computational linguistics-Volume 1, volume 1, page 7. Association for Computational Linguistics.
  15. Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27:379-423, 623-656.
  16. Subramaniam, L. V., Roy, S., Faruquie, T. A., and Negi, S. (2009). A Survey of Types of Text Noise and Techniques to Handle Noisy Text. Language, pages 115- 122.
  17. Varnhagen, C. K., McFall, G. P., Figueredo, L., Takach, B. S., Daniels, J., and Cuthbertson, H. (2009). Spelling and the Web. Journal of Applied Developmental Psychology, 30(4):454-462.
  18. Voorhees, E. M., Garofolo, J., and Sparck Jones, K. (2000). The TREC-6 Spoken Document Retrieval Track. Bulletin of the American Society for Information Science and Technology, 26(5):18-19.
  19. Wikipedia Community (2012). Wikipedia List of Common Misspellings. http://en.wikipedia.org/wiki/ Wikipedia:Lists_of_common_misspellings. Last access 15 Jan. 2012.
  20. Wiktionary Community (2012). Wiktionary Online Collaborative Dictionary. http://en.wiktionary.org/wiki/ Wiktionary:Main_Page. Last access 15 Jan. 2012.
  21. Wilcox-O'Hearn, A., Hirst, G., and Budanitsky, A. (2008). Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model. In A. Gelbukh, editor, In Proceedings of CICLing-2008 (LNCS 4919, Springer-Verlag, pages 605-616.
  22. Wong, W., Liu, W., and Bennamoun, M. (2006). Integrated Scoring for Spelling Error Correction, Abbreviation Expansion and Case Restoration in Dirty Text. In 5th Australasian conference on Data mining and analystics (AusDM'06), pages 83-89, Sydney, Australia. Australian Computer Society.
Download


Paper Citation


in Harvard Style

Renard A., Calabretto S. and Rumpler B. (2012). A Tool to Evaluate Error Correction Resources and Processes Suited for Documents Improvement . In Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8565-11-2, pages 27-35. DOI: 10.5220/0003998800270035


in Bibtex Style

@conference{iceis12,
author={Arnaud Renard and Sylvie Calabretto and Béatrice Rumpler},
title={A Tool to Evaluate Error Correction Resources and Processes Suited for Documents Improvement},
booktitle={Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2012},
pages={27-35},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003998800270035},
isbn={978-989-8565-11-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - A Tool to Evaluate Error Correction Resources and Processes Suited for Documents Improvement
SN - 978-989-8565-11-2
AU - Renard A.
AU - Calabretto S.
AU - Rumpler B.
PY - 2012
SP - 27
EP - 35
DO - 10.5220/0003998800270035