# THE EXTENDED BOYER-MOORE-HORSPOOL ALGORITHM FOR LOCALITY-SENSITIVE PSEUDO-CODE

### Kengo Terasawa, Toshio Kawashima, Yuzuru Tanaka

#### Abstract

Boyer-Moore-Horspool (BMH) algorithm is known as a very efficient algorithm that finds a place where a certain string specified by the user appears within a longer text string. In this study, we propose the Extended Boyer-Moore-Horspool algorithm that can retrieve a pattern in the sequence of real vectors, rather than in the sequence of the characters. We reproduced the BMH algorithm to the sequence of real vectors by transforming the vectors into pseudo-code expression that consists of multiple integers and by introducing a novel binary relation called ‘semiequivalent.’ We confirmed the practical utility of our algorithm by applying it to the string matching problem of the images from “Minutes of the Imperial Diet,” to which optical character recognition does not work well.

#### References

- Andoni, A. and Indyk, P. (2006). Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proc. Symposium on Foundations of Computer Science, FOCS'06, pp. 459-468.
- Boyer, R. S. and Moore, J. S. (1977). A fast string searching algorithm. In Communications of the ACM, vol. 20, pp. 762-772.
- Datar, M., Indyk, P., Immorlica, N., and Mirrokni, V. (2004). Locality-sensitive hashing scheme based on pstable distributions. In Proc. 20th ACM Symposium on Computational Geometry, SoCG2004, pp. 253-262.
- Gionis, A., Indyk, P., and Motwani, R. (1999). Similarity search in high dimensions via hashing. In Proc. 25th Int. Conf. on Very Large Data Base, VLDB1999, pp. 518-529.
- Horspool, R. N. (1980). Practical fast searching in strings. In Software - Practice & Experience, vol. 10, issue 6, pp. 501-506.
- Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. In International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110.
- Terasawa, K., Nagasaki, T., and Kawashima, T. (2006). Improved handwritten text retrieval using gradient distribution features (written in japanese). In Proc. Meeting on Image Recognition and Understanding, MIRU2006, pp.1325-1330.
- Terasawa, K. and Tanaka, Y. (2007a). Locality sensitive pseudo-code for document images. In Proc. 9th Int. Conf. on Document Analysis and Recognition, ICDAR2007, vol. 1, pp. 73-77.
- Terasawa, K. and Tanaka, Y. (2007b). Spherical lsh for approximate nearest neighbor search on unit hypersphere. In Proc. 10th Workshop on Algorithms and Data Structures, WADS2007, LNCS4619, pp. 27-38.

#### Paper Citation

#### in Harvard Style

Terasawa K., Kawashima T. and Tanaka Y. (2011). **THE EXTENDED BOYER-MOORE-HORSPOOL ALGORITHM FOR LOCALITY-SENSITIVE PSEUDO-CODE** . In *Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)* ISBN 978-989-8425-47-8, pages 437-441. DOI: 10.5220/0003369004370441

#### in Bibtex Style

@conference{visapp11,

author={Kengo Terasawa and Toshio Kawashima and Yuzuru Tanaka},

title={THE EXTENDED BOYER-MOORE-HORSPOOL ALGORITHM FOR LOCALITY-SENSITIVE PSEUDO-CODE},

booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)},

year={2011},

pages={437-441},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003369004370441},

isbn={978-989-8425-47-8},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)

TI - THE EXTENDED BOYER-MOORE-HORSPOOL ALGORITHM FOR LOCALITY-SENSITIVE PSEUDO-CODE

SN - 978-989-8425-47-8

AU - Terasawa K.

AU - Kawashima T.

AU - Tanaka Y.

PY - 2011

SP - 437

EP - 441

DO - 10.5220/0003369004370441