SUFFIX ARRAYS - A Competitive Choice for Fast Lempel-Ziv Compressions

Artur J. Ferreira, Arlindo L. Oliveira, Mário A. T. Figueiredo

2008

Abstract

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared with suffix trees, that make them preferable in some compression scenarios.

References

  1. Abouelhoda, M., Kurtz, S., and Ohlebusch, E. (2004). Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2(1):53-86.
  2. Fiala, M. and Holub, J. (2008). DCA using suffix arrays. In Data Compression Conference DCC2008, page 516.
  3. Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences. Cambridge University Press.
  4. Karkainen, J., Sanders, P., and S.Burkhardt (2006). Linear work suffix array construction. Journal of the ACM, 53(6):918-936.
  5. Larsson, N. (1996). Extended application of suffix trees to data compression. In Data Compression Conference, page 190.
  6. Larsson, N. (1999). Structures of String Matching and Data Compression. PhD thesis, Department of Computer Science, Lund University, Sweden.
  7. Manber, U. and Myers, G. (1993). Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935-948.
  8. McCreight, E. (1976). A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262- 272.
  9. Sadakane, K. (2000). Compressed text databases with efficient query algorithms based on the compressed suffix array. In ISAAC'00, volume LNCS 1969, pages 410- 421.
  10. Salomon, D. (2007). Data Compression - The complete reference. Springer-Verlag London Ltd, London, fourth edition.
  11. Sestak, R., Lnsk, J., and Zemlicka, M. (2008). Suffix array for large alphabet. In Data Compression Conference DCC2008, page 543.
  12. Storer, J. and Szymanski, T. (1982). Data compression via textual substitution. Journal of ACM, 29(4):928-951.
  13. Ukkonen, E. (1995). On-line construction of suffix trees. Algorithmica, 14(3):249-260.
  14. Weiner, P. (1973). Linear pattern matching algorithm. In 14th Annual IEEE Symposium on Switching and Automata Theory, volume 27, pages 1-11.
  15. Zhang, S. and Nong, G. (2008). Fast and space efficient linear suffix array construction. In Data Compression Conference DCC2008, page 553.
  16. Ziv, J. and Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337-343.
Download


Paper Citation


in Harvard Style

J. Ferreira A., L. Oliveira A. and A. T. Figueiredo M. (2008). SUFFIX ARRAYS - A Competitive Choice for Fast Lempel-Ziv Compressions . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008) ISBN 978-989-8111-60-9, pages 5-12. DOI: 10.5220/0001935200050012


in Bibtex Style

@conference{sigmap08,
author={Artur J. Ferreira and Arlindo L. Oliveira and Mário A. T. Figueiredo},
title={SUFFIX ARRAYS - A Competitive Choice for Fast Lempel-Ziv Compressions},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)},
year={2008},
pages={5-12},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001935200050012},
isbn={978-989-8111-60-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)
TI - SUFFIX ARRAYS - A Competitive Choice for Fast Lempel-Ziv Compressions
SN - 978-989-8111-60-9
AU - J. Ferreira A.
AU - L. Oliveira A.
AU - A. T. Figueiredo M.
PY - 2008
SP - 5
EP - 12
DO - 10.5220/0001935200050012