
 
compression techniques by simply employing 
instead of IPC and PFD, other constructions.  
6 CONCLUSIONS AND OPEN 
PROBLEMS 
We have presented a novel technique for 
compressing efficiently inverted files, when storing 
the document identifiers. The proposed 
constructions can be harmonically combined with 
other techniques that have been proposed in the 
literature such as IPC and PFD and produce 
compression results that are competitive and in the 
majority of the cases even better than those of the 
previous works. As the careful reader should have 
noticed the handling of the secondary index with the 
extra identifiers constitutes the main burden of our 
technique. This burden can be relieved by using 
more arithmetic progressions when representing 
each initial inverted list, and here there exists a 
tradeoff that is worth the effort to be further 
explored, since it could lead to a whole set of 
parametric techniques. Moreover it could be 
interesting to investigate further techniques of 
handling the secondary index that could lead to 
faster decompression performance. 
ACKNOWLEDGEMENTS 
This research has been co-financed by the European 
Union (European Social Fund-ESF) and Greek 
national funds through the Operational Program 
“Education and Lifelong Learning” of the National 
Strategic Reference Framework (NSRF)-Research 
Funding Program: Heracleitus II. Investing in 
knowledge society through the European Social 
Fund. 
This research has been co-financed by the 
European Union (European Social Fund-ESF) and 
Greek national funds through the Operational 
Program “Education and Lifelong Learning” of the 
National Strategic Reference Framework (NSRF)-
Research Funding Program: Thales. Investing in 
knowledge society through the European Social 
Fund. 
REFERENCES 
Baeza-Yates, R., Ribeiro-Neto, B. 2011, Modern 
Information Retrieval:  the  concepts  and technology 
  behind search, second edition, Essex: Addison Wesley. 
Callan, J. 2009, The ClueWeb09 Dataset. available at 
http://boston.lti.cs.cmu.edu/clueweb09 (accessed 1st 
August 2012). 
Chierichetti, F., Kumar, R., Raghavan, P., 2009. 
Compressed web indexes. In: 18th Int. World Wide 
Web Conference, pp. 451–460. 
Ding, S., Attenberg, J., Suel, T., 2010, Scalable 
Techniques for Document Identifier Assignment in 
Inverted Indexes, Proceedings of the 19th 
International Conf. on World Wide Web, pp. 311-320. 
He, J., Yan, H., Suel, T., 2009. Compact full-text indexing 
of versioned document collections, Proceedings of the 
18th ACM Conference on Information and knowledge 
management, November 02-06, Hong Kong, China  
Heman, S. 2005. Super-scalar database compression 
between RAM and CPU-cache. MS Thesis, Centrum 
voor Wiskunde en Informatica, Amsterdam. 
Moffat, A., Stuiver, L., 2000,  Binary interpolative coding 
for effective index compression, Information 
Retrieval, 3, 25-47. 
Navarro, G., Silva De Moura, E., Neubert, M.,  Ziviani, 
N., Baeza-Yates R., 2000, Adding Compression to 
Block Addressing Inverted Indexes, Information 
Retrieval, 3, 49-77. 
Ntoulas A., Cho J., 2007. Pruning policies for two-tiered 
inverted index with correctness guarantee, 
Proceedings of the 30th Annual International ACM 
SIGIR conference on Research and development in 
Information Retrieval, July 23-27, Amsterdam, The 
Netherlands.  
Scholer, F., Williams, H.E.,  Yiannis, J.,  Zobel, J. 2002. 
Compression of inverted indexes for fast query 
evaluation,  In 25th Annual ACM SIGIR Conference, 
pp. 222-229. 
Witten, I. H., Moffat, A., and Bell, T., 1999. Managing 
Gigabytes: Compressing and Indexing Documents and 
Images. Morgan Kaufmann Publishers, 2nd edition. 
Yan H., Ding S., Suel T., 2009. Inverted index 
compression and query processing with optimized 
document ordering, Proceedings of the 18th 
international conference on World Wide Web, April 
20-24, 2009, Madrid, Spain 
Yan, H., Ding, S., Suel, T., 2009, Compressing term 
positions in Web indexes, pp. 147-154,  Proceedings 
of  the 32nd Annual International ACM SIGIR 
Conference on Research and Development in 
Information Retrieval. 
Zhang, J., Long, X., and Suel, T. 2008. Performance of 
compressed inverted list caching in search engines. In 
the 17th International World Wide Web Conf. WWW. 
Zobel, J., Moffat, A., 2006. Inverted Files for Text Search 
Engines,  ACM Computing Surveys, Vol. 38, No. 2, 
Article 6. 
Zukowski, M., Heman, S., Nes, N., and Boncz, P. 2006. 
Super-scalar RAM-CPU cache compression. In the 
22
nd
 International Conf. on Data Engineering (ICDE) 
2006. 
WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies
256