A Pixel Labeling Framework for Comparing Texture Features Application to Digitized Ancient Books

Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Alain Boucher, Rémy Mullot

2014

Abstract

In this article, a complete framework for the comparative analysis of texture features is presented and evaluated for the segmentation and characterization of ancient book pages. Firstly, the content of an entire book is characterized by extracting the texture attributes of each page. The extraction of the texture features is based on a multiresolution analysis. Secondly, a clustering approach is performed in order to classify automatically the homogeneous regions of book pages. Namely, two approaches are compared based on two different statistical categories of texture features, autocorrelation and co-occurrence, in order to segment the content of ancient book pages and find homogeneous regions with little a priori knowledge. By computing several clustering and classification accuracy measures, the results of the comparison show the effectiveness of the proposed framework. Tests on different book contents (text vs. graphics, manuscript vs. printed) show that those texture features are more suitable to distinguish textual regions from graphical ones, than to distinguish text fonts.

References

  1. Bourgeois, F. L., Trinh, E., Allier, B., Eglin, V., and Emptoz, H. (2004). Digital libraries and document image analysis. In DIAL, pages 2-24.
  2. Bres, S. (1994). Contributions à la quantification des critères de transparence et d'anisotropie par une approche globale: application au controˆle de qualité de matériaux composites. PhD thesis, INSA, France.
  3. Busch, A., Boles, W. W., and Sridharan, S. (2005). Texture for script identification. PAMI, pages 1720-1732.
  4. Chen, C. H., Pau, L. F., and Wang, P. (1998). Texture analysis in the handbook of pattern recognition and computer vision. World Scientific, second edition.
  5. Coustaty, M., Pareti, R., Vincent, N., and Ogier, J. M. (2011). Towards historical document indexing: extraction of drop cap letters. IJDAR, pages 243-254.
  6. Ding, K., Liu, Z., Jin, L., and Zhu, X. (2007). A comparative study of Gabor feature and gradient feature for handwritten chinese character recognition. In WAPR, pages 1182-1186.
  7. Fowlkes, E. B. and Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. JASA, pages 553-569.
  8. Haralick, R. M., Shanmugam, K., and Dinstein, I. (1973). Textural features for image classification. SMC, pages 610-621.
  9. Howarth, P. and Ruger, S. (2004). Evaluation of texture features for content-based image retrieval. IVR, pages 326-334.
  10. Jain, A. K. and Zhong, Y. (1996). Page segmentation using texture analysis. PR, pages 743-770.
  11. Journet, N., Ramel, J., Mullot, R., and Eglin, V. (2008). Document image characterization using a multiresolution analysis of the texture: application to old documents. IJDAR, pages 9-18.
  12. Kaufman, L. and Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
  13. Ketchen, D. J. and Shook, C. L. (1996). The application of cluster analysis in strategic management research: an analysis and critique. SMJ, pages 441-458.
  14. Knuth, D. E. (1997). The art of computer programming, volume 3: (2nd ed.) sorting and searching. Addison Wesley Longman Publishing Co.
  15. Kricha, A. and Amara, N. E. B. (2011). Exploring textural analysis for historical documents characterization. JC, pages 24-30.
  16. Lance, G. N. and Williams, W. T. (1967). A general theory of classificatory sorting strategies 1. Hierarchical systems. CJ, pages 373-380.
  17. Lin, M., Tapamo, J., and Ndovie, B. (2006). A texture-based method for document segmentation and classification. SACJ, pages 49-56.
  18. Liu, C. L., Koga, M., and Fujisawa, H. (2005). Gabor feature extraction for character recognition: comparison with gradient feature. In ICDAR, pages 121-125.
  19. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In MSP, pages 281-297.
  20. Mahalanobis, P. (1936). On the generalised distance in statistics. In NISI, pages 49-55.
  21. Makhoul, J., Kubala, F., Schwartz, R., and Weischedel, R. (1999). Performance measures for information extraction. In DARPA, pages 249-252.
  22. Mao, S., Rosenfeld, A., and Kanungo, T. (2003). Document structure analysis algorithms: a literature survey. In DRR, pages 197-207.
  23. Mehri, M., Gomez-Krämer, P., Héroux, P., Boucher, A., and Mullot, R. (2013a). Texture feature evaluation for segmentation of historical document images. In HIP, pages 102-109.
  24. Mehri, M., Gomez-Krämer, P., Héroux, P., and Mullot, R. (2013b). Old document image segmentation using the autocorrelation function and multiresolution analysis. In DRR.
  25. Mehri, M., Héroux, P., Gomez-Krämer, P., and Mullot, R. (2013c). A pixel labeling approach for historical digitized books. In ICDAR, pages 817-821.
  26. Mikkilineni, A. K., Chiang, P. J., Ali, G. N., Chiu, G. T. C., Allebach, J. P., and III, E. J. D. (2005). Printer identification based on graylevel co-occurrence features for security and forensic applications. In SSWMC, pages 430-440.
  27. Monti, S., Tamayo, P., Mesirov, J., and Golub, T. (2003). Consensus Clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. ML, pages 91-118.
  28. Mullot, R. (2006). Les documents écrits : De la numérisation à l'indexation par le contenu. Hermès.
  29. Nguyen, G., Coustaty, M., and Ogier, J. M. (2010). Stroke feature extraction for lettrine indexing. In IPTA, pages 355-360.
  30. Ouji, A., Leydier, Y., and Bourgeois, F. L. (2011). Chromatic / achromatic separation in noisy document images. In ICDAR, pages 167-171.
  31. Payne, J. S., Stonham, T. J., and Patel, D. (1994). Document segmentation using texture analysis. In ICPR, pages 380-382.
  32. Peake, G. and Tan, T. (1997). Script and language identification from document images. In DIA, pages 10-17.
  33. Petrou, M. and Sevilla, P. G. (2006). Image processing: dealing with texture. John Wiley & Sons.
  34. Said, H. E. S., Tan, T. N., and Baker, K. D. (2000). Personal identification based on handwriting. PR, pages 149- 160.
  35. Saxena, P. C. and Navaneetham, K. (1991). The effect of cluster size, dimensionality, and number of clusters on recovery of true cluster structure through Chernofftype faces. RSS, pages 415-425.
  36. Simpson, T., Armstrong, J., and Jarman, A. (2010). Merged consensus clustering to assess and improve class discovery with microarray data. BMC, pages 1471-1482.
  37. Uttama, S., Loonis, P., Delalandre, M., and Ogier, J. M. (2006). Segmentation and retrieval of ancient graphic documents. In GREC, pages 88-98.
  38. Zhu, Y., Tan, T., and Wang, Y. (2001). Font recognition based on global texture analysis. PAMI, pages 1192- 1200.
Download


Paper Citation


in Harvard Style

Mehri M., Gomez-Krämer P., Héroux P., Boucher A. and Mullot R. (2014). A Pixel Labeling Framework for Comparing Texture Features Application to Digitized Ancient Books . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 553-560. DOI: 10.5220/0004804705530560


in Bibtex Style

@conference{icpram14,
author={Maroua Mehri and Petra Gomez-Krämer and Pierre Héroux and Alain Boucher and Rémy Mullot},
title={A Pixel Labeling Framework for Comparing Texture Features Application to Digitized Ancient Books},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={553-560},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004804705530560},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Pixel Labeling Framework for Comparing Texture Features Application to Digitized Ancient Books
SN - 978-989-758-018-5
AU - Mehri M.
AU - Gomez-Krämer P.
AU - Héroux P.
AU - Boucher A.
AU - Mullot R.
PY - 2014
SP - 553
EP - 560
DO - 10.5220/0004804705530560