Column-oriented Database Systems and XML Compression

Tyler Corbin, Tomasz Müldner, Jan Krzysztof Miziołek

2014

Abstract

The verbose nature of XML requires data compression, which makes it more difficult to efficiently implement querying. At the same time, the renewed industrial and academic interest in Column-Oriented DBMS (column-stores) resulted in improved efficiency of queries in these DBMS. Nevertheless there has been no research on relations between XML compression and column-stores. This paper describes an existing XML compressor and shows the inherent similarities between its compression technique and column-stores. Efficiency of compression is tested using specially designed benchmark data.

References

  1. (2014). eXist DBMS, retrieved March 2014 from http:// exist-db.org/exist/apps/homepage/index.html.
  2. Abadi, D., Boncz, P. A., Harizopoulos, S., Idreos, S., and Madden, S. (2013). The design and implementation of modern column-oriented database systems. Foundations and Trends in Databases, 5(3):197-280.
  3. Amazon (2014). Amazon Redshift, retrieved March 2014 from http://aws.amazon.com/redshift/.
  4. Choi, B. and Buneman, P. (2003). XML Vectorization: a Column-Based XML Storage Model. Technical report, University of Pennsylvania, Department of Computer & Information Science.
  5. Corbin, T., Müldner, T., and Miziolek, J. (2013). Parallelization of Permuting Schema-less XML Compressors. In PPAM 13.
  6. Florescu, D. and Kossmann, D. (1999). Storing and Querying XML Data using an RDBMS. Bulletin of the Technical Committee on Data Engineering, 22(3):27-34.
  7. Fry, C. (2011). Extending The XML Compressor Exact With Lazy Updates. Master's thesis, Acadia University, Canada.
  8. Hartmut, L. and Suciu, D. (2000). XMill: an efficient compressor for XML data. ACM Special Interest Group on Management of Data (SIGMOD) Record, 29(2):153- 164.
  9. Müldner, T., Fry, C., Miziolek, J., and Corbin, T. (2010). Updates of compressed dynamic XML documents. In The Eighth International Network Conference (INC2010), Heidelberg, Germany.
  10. Müldner, T., Fry, C., Miziolek, J., and Corbin, T. (2012). Parallelization of an XML Data Compressor on Multicores. In Wyrzykowski, R., Dongarra, J., Karczewski, K., and Wasniewski, J., editors, Parallel Processing and Applied Mathematics, volume 7204 of Lecture Notes in Computer Science, pages pp. 101-110. Springer Berlin Heidelberg.
  11. Müldner, T., Fry, C., Miziolek, J., and Durno, S. (2009). XSAQCT: XML queryable compressor. In Balisage: The Markup Conference 2009, Montreal, Canada.
  12. Müldner, T., Miziolek, J., and Corbin, T. (2014). Annotated Trees and their Applications to XML Compression. In The Tenth International Conference on Web Information Systems and Technologies, Barcelona, Spain. WEBIST.
  13. OSM (2014). OpenStreetMap Foundation, retrieved March 2014 from http://wiki.osmfoundation.org/ wiki/Main_Page.
  14. Raman, V., Attaluri, G. K., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G. M., Malkemus, T., Müller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A. J., and Zhang, L. (2013). Db2 with blu acceleration: So much more than just a column store. PVLDB, 6(11):1080-1091.
  15. SkibiÁski, P. and Swacha, J. (2007). Combining efficient XML compression with query processing. pages 330- 342.
  16. Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E. J., O'Neil, P. E., Rasin, A., Tran, N., and Zdonik, S. B. (2005). C-store: A column-oriented dbms. In Bøhm, K., Jensen, C. S., Haas, L. M., Kersten, M. L., Larson, P. A., and Ooi, B. C., editors, VLDB, pages 553-564. ACM.
  17. The DBLP Computer Science Bibliography (2014). The DBLP Computer Science Bibliography, retrieved March 2014 from http://dblp.uni-trier.de/db/.
  18. The UniProt Consortium (2013). Update on activities at the Universal Protein Resource (UniProt) in 2013. http://dx.doi.org/10.1093/nar/gks1068. Retrieved on June 20, 2013.
  19. Wikipedia: The Free Encyclopedia (2014). The Free Encyclopedia, retrieved March 2014 from http://en. wikipedia.org/wiki/Main_Page.
  20. XML (2013). Extensible markup language (XML) 1.0 (Fifth edition), retrieved October 2013 from http://www.w3.org/tr/rec-xml/.
  21. XML Schema (2013). XML Schema, retrieved October 2013 from http://www.w3.org/XML/Schema.
  22. xmlgen (2013). The benchmark data generator, retrieved October 2013 from http://www.xml-benchmark.org/ generator.html.
  23. XPath (2013). XML Path Language (XPath), Retrieved October 2013 from http://www.w3.org/TR/xpath/.
  24. XQuery (2013). XQuery 1.0: An XML Query Language (Second Edition), Retrieved October 2013 from http:// www.w3.org/TR/xquery/.
  25. XQuery Update (2013). Xquery update facility 1.0, retrieved October 2013 from http://www.w3.org/TR/ xquery-update-10/.
  26. Zukowski, M., van der Wiel, M., and Boncz, P. (2012). Vectorwise: a Vectorized Analytical DBMS, retrieved March 2014 from http://www.w3.org/TR/REC-xml/.
Download


Paper Citation


in Harvard Style

Corbin T., Müldner T. and Miziołek J. (2014). Column-oriented Database Systems and XML Compression . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 107-115. DOI: 10.5220/0004995001070115


in Bibtex Style

@conference{data14,
author={Tyler Corbin and Tomasz Müldner and Jan Krzysztof Miziołek},
title={Column-oriented Database Systems and XML Compression},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={107-115},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004995001070115},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Column-oriented Database Systems and XML Compression
SN - 978-989-758-035-2
AU - Corbin T.
AU - Müldner T.
AU - Miziołek J.
PY - 2014
SP - 107
EP - 115
DO - 10.5220/0004995001070115