XPACK: A HIGH-PERFORMANCE WEB DOCUMENT ENCODING

Daniel Rocco, James Caverlee, Ling Liu

Abstract

XML is an increasingly popular data storage and exchange format whose popularity can be attributed to its self-describing syntax, acceptance as a data transmission and archival standard, strong internationalization support, and a plethora of supporting tools and technologies. However, XML’s verbose, repetitive, text-oriented document specification syntax is a liability for many emerging applications such as mobile computing and distributed document dissemination. This paper presents XPack, an efficient XML document compression system that exploits information inherent in the document structure to enhance compression quality. Additionally, the utilization of XML structure features in XPack’s design should provide valuable support for structure-aware queries over compressed documents. Taken together, the techniques employed in the XPack compression scheme provide a foundation for efficiently storing, transmitting, and operating over Web documents. Initial experimental results demonstrate that XPack can reduce the storage requirements for Web documents by up to 20% over previous XML compression techniques. More significantly, XPack can simultaneously support operations over the documents, providing up to two orders of magnitude performance improvement for certain document operations when compared to equivalent operations on unencoded XML documents.

References

  1. Altinel, M. and Franklin, M. J. (2000). Efficient filtering of xml documents for selective dissemination of information. In Proceedings of the 26th International Conference on Very Large Databases (VLDB 7800).
  2. Bharat, K., Broder, A., Henzinger, M., Kumar, P., and Venkatasubramanian, S. (1998). The connectivity server: Fast access to linkage information on the web. In Proceedings of the Seventh International World Wide Web Conference (WWW 7898).
  3. Bray (1998). Extensible markup language (XML) 1.0. Technical report, W3C.
  4. Buneman, P., Grohe, M., and Koch, C. (2003). Path queries on compressed xml. In Proceedings of the 29th International Conference on Very Large Databases (VLDB 7803).
  5. Buttler, D., Liu, L., and Rocco, D. (2003). Efficient processing of web page sentinels using page digest. Technical report, Georgia Institute of Technology.
  6. Buttler, D., Rocco, D., and Liu, L. (2004). Efficient web change monitoring with page digest. 13th Annual International World Wide Web Conference WWW2004 (poster symposium).
  7. Cheney, J. (2001). Compressing XML with multiplexed hierarchical PPM models. In Data Compression Conference.
  8. Christensen, E., Curbera, F., Meredith, G., and Weerawarana, S. (2001). Web services description language (WSDL) 1.1. Technical report, W3C.
  9. Girardot, M. and Sundaresan, N. (2000). Millau: an encoding format for efficient representation and exchange of xml over the web. In Proceedings of the Ninth International World Wide Web Conference (WWW 2000).
  10. Liefke, H. and Suciu, D. (2000). XMill: an efficient compressor for XML data. In ACM International Conference on Management of Data (SIGMOD), pages 153- 164.
  11. loup Gailly, J. and Adler, M. (2004). Gzip compression algorithm. http://www.gzip.org/algorithm.txt.
  12. Min, J.-K., Park, M.-J., and Chung, C.-W. (2003). Xpress: A queriable compression for xml data. In Proceedings of the 2003 ACM Conference on Management of Data (SIGMOD 7803).
  13. Mitra, N. (2003). Soap version 1.2 part 0: Primer. Technical report, World Wide Web Consortium.
  14. Rocco, D., Buttler, D., and Liu, L. (2003). Page digest for large-scale web services. In Proceedings of the IEEE Conference on Electronic Commerce.
  15. Savourel, Y. (2001). XML Internationalization and Localization. SAMS.
  16. Sayood, K. (2000). Introduction to Data Compression. Morgan Kaufmann, New York, second edition.
  17. Tolani, P. and Haritsa, J. R. (2002). XGRIND: A queryfriendly XML compressor. In ICDE.
  18. Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, New York, second edition.
  19. Ziv, J. and Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(65):337-343.
Download


Paper Citation


in Harvard Style

Rocco D., Caverlee J. and Liu L. (2005). XPACK: A HIGH-PERFORMANCE WEB DOCUMENT ENCODING . In Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 972-8865-20-1, pages 32-39. DOI: 10.5220/0001233000320039


in Bibtex Style

@conference{webist05,
author={Daniel Rocco and James Caverlee and Ling Liu},
title={XPACK: A HIGH-PERFORMANCE WEB DOCUMENT ENCODING},
booktitle={Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2005},
pages={32-39},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001233000320039},
isbn={972-8865-20-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - XPACK: A HIGH-PERFORMANCE WEB DOCUMENT ENCODING
SN - 972-8865-20-1
AU - Rocco D.
AU - Caverlee J.
AU - Liu L.
PY - 2005
SP - 32
EP - 39
DO - 10.5220/0001233000320039