Exploiting Progressions for Improving Inverted Index Compression

Christos Makris, Yannis Plegas

2013

Abstract

In this paper, we present an algorithmic technique for compressing the document identifier lists of an inverted index which can be combined with state of the art compression techniques such as algorithms from the PForDelta family or Interpolative Coding and attain significant space compaction gains. The new technique initially converts the lists of document identifiers to a set of arithmetic progressions; the representation of an arithmetic progression uses at most three (!) numbers. This process produces an overhead which are the multiple identifiers that have to be assigned to the documents so we have to use a secondary inverted index and an extra compression step (PForDelta or Interpolative Coding) to represent it. Performed experiments in the ClueWeb09 dataset depict the superiority in space compaction of the proposed technique.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B. 2011, Modern Information Retrieval: the concepts and technology behind search, second edition, Essex: Addison Wesley.
  2. Baeza-Yates, R., Ribeiro-Neto, B. 2011, Modern Information Retrieval: the concepts and technology behind search, second edition, Essex: Addison Wesley.
  3. Callan, J. 2009, The ClueWeb09 Dataset. available at http://boston.lti.cs.cmu.edu/clueweb09 (accessed 1st August 2012).
  4. Callan, J. 2009, The ClueWeb09 Dataset. available at http://boston.lti.cs.cmu.edu/clueweb09 (accessed 1st August 2012).
  5. Chierichetti, F., Kumar, R., Raghavan, P., 2009. Compressed web indexes. In: 18th Int. World Wide Web Conference, pp. 451-460.
  6. Chierichetti, F., Kumar, R., Raghavan, P., 2009. Compressed web indexes. In: 18th Int. World Wide Web Conference, pp. 451-460.
  7. Ding, S., Attenberg, J., Suel, T., 2010, Scalable Techniques for Document Identifier Assignment in Inverted Indexes, Proceedings of the 19th International Conf. on World Wide Web, pp. 311-320.
  8. Ding, S., Attenberg, J., Suel, T., 2010, Scalable Techniques for Document Identifier Assignment in Inverted Indexes, Proceedings of the 19th International Conf. on World Wide Web, pp. 311-320.
  9. He, J., Yan, H., Suel, T., 2009. Compact full-text indexing of versioned document collections, Proceedings of the 18th ACM Conference on Information and knowledge management, November 02-06, Hong Kong, China
  10. He, J., Yan, H., Suel, T., 2009. Compact full-text indexing of versioned document collections, Proceedings of the 18th ACM Conference on Information and knowledge management, November 02-06, Hong Kong, China
  11. Heman, S. 2005. Super-scalar database compression between RAM and CPU-cache. MS Thesis, Centrum voor Wiskunde en Informatica, Amsterdam.
  12. Heman, S. 2005. Super-scalar database compression between RAM and CPU-cache. MS Thesis, Centrum voor Wiskunde en Informatica, Amsterdam.
  13. Moffat, A., Stuiver, L., 2000, Binary interpolative coding for effective index compression, Information Retrieval, 3, 25-47.
  14. Moffat, A., Stuiver, L., 2000, Binary interpolative coding for effective index compression, Information Retrieval, 3, 25-47.
  15. Navarro, G., Silva De Moura, E., Neubert, M., Ziviani, N., Baeza-Yates R., 2000, Adding Compression to Block Addressing Inverted Indexes, Information Retrieval, 3, 49-77.
  16. Navarro, G., Silva De Moura, E., Neubert, M., Ziviani, N., Baeza-Yates R., 2000, Adding Compression to Block Addressing Inverted Indexes, Information Retrieval, 3, 49-77.
  17. Ntoulas A., Cho J., 2007. Pruning policies for two-tiered inverted index with correctness guarantee, Proceedings of the 30th Annual International ACM SIGIR conference on Research and development in Information Retrieval, July 23-27, Amsterdam, The Netherlands.
  18. Ntoulas A., Cho J., 2007. Pruning policies for two-tiered inverted index with correctness guarantee, Proceedings of the 30th Annual International ACM SIGIR conference on Research and development in Information Retrieval, July 23-27, Amsterdam, The Netherlands.
  19. Scholer, F., Williams, H.E., Yiannis, J., Zobel, J. 2002. Compression of inverted indexes for fast query evaluation, In 25th Annual ACM SIGIR Conference, pp. 222-229.
  20. Scholer, F., Williams, H.E., Yiannis, J., Zobel, J. 2002. Compression of inverted indexes for fast query evaluation, In 25th Annual ACM SIGIR Conference, pp. 222-229.
  21. Witten, I. H., Moffat, A., and Bell, T., 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, 2nd edition.
  22. Witten, I. H., Moffat, A., and Bell, T., 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, 2nd edition.
  23. Yan H., Ding S., Suel T., 2009. Inverted index compression and query processing with optimized document ordering, Proceedings of the 18th international conference on World Wide Web, April 20-24, 2009, Madrid, Spain
  24. Yan H., Ding S., Suel T., 2009. Inverted index compression and query processing with optimized document ordering, Proceedings of the 18th international conference on World Wide Web, April 20-24, 2009, Madrid, Spain
  25. Yan, H., Ding, S., Suel, T., 2009, Compressing term positions in Web indexes, pp. 147-154, Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
  26. Yan, H., Ding, S., Suel, T., 2009, Compressing term positions in Web indexes, pp. 147-154, Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
  27. Zhang, J., Long, X., and Suel, T. 2008. Performance of compressed inverted list caching in search engines. In the 17th International World Wide Web Conf. WWW.
  28. Zhang, J., Long, X., and Suel, T. 2008. Performance of compressed inverted list caching in search engines. In the 17th International World Wide Web Conf. WWW.
  29. Zobel, J., Moffat, A., 2006. Inverted Files for Text Search Engines, ACM Computing Surveys, Vol. 38, No. 2, Article 6.
  30. Zobel, J., Moffat, A., 2006. Inverted Files for Text Search Engines, ACM Computing Surveys, Vol. 38, No. 2, Article 6.
  31. Zukowski, M., Heman, S., Nes, N., and Boncz, P. 2006. Super-scalar RAM-CPU cache compression. In the 22nd International Conf. on Data Engineering (ICDE) 2006.
  32. Zukowski, M., Heman, S., Nes, N., and Boncz, P. 2006. Super-scalar RAM-CPU cache compression. In the 22nd International Conf. on Data Engineering (ICDE) 2006.
Download


Paper Citation


in Harvard Style

Makris C. and Plegas Y. (2013). Exploiting Progressions for Improving Inverted Index Compression . In Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8565-54-9, pages 251-256. DOI: 10.5220/0004365402510256


in Harvard Style

Makris C. and Plegas Y. (2013). Exploiting Progressions for Improving Inverted Index Compression . In Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8565-54-9, pages 251-256. DOI: 10.5220/0004365402510256


in Bibtex Style

@conference{webist13,
author={Christos Makris and Yannis Plegas},
title={Exploiting Progressions for Improving Inverted Index Compression},
booktitle={Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2013},
pages={251-256},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004365402510256},
isbn={978-989-8565-54-9},
}


in Bibtex Style

@conference{webist13,
author={Christos Makris and Yannis Plegas},
title={Exploiting Progressions for Improving Inverted Index Compression},
booktitle={Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2013},
pages={251-256},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004365402510256},
isbn={978-989-8565-54-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Exploiting Progressions for Improving Inverted Index Compression
SN - 978-989-8565-54-9
AU - Makris C.
AU - Plegas Y.
PY - 2013
SP - 251
EP - 256
DO - 10.5220/0004365402510256


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Exploiting Progressions for Improving Inverted Index Compression
SN - 978-989-8565-54-9
AU - Makris C.
AU - Plegas Y.
PY - 2013
SP - 251
EP - 256
DO - 10.5220/0004365402510256