USING RELEVANT SETS FOR OPTIMIZING XML INDEXES

Paz Biber, Ehud Gudes

Abstract

Local bisimilarity has been proposed as an approximate structural summary for XML and other semi-structured databases. Approximate structural summary, such as A(k)-Index and D(k)-Index, reduce the index's size (and therefore reduce query evaluation time) by compromising on the long path queries. We introduce the A(k)-Simplified and the A(k)-Relevant, approximate structural summaries for graph documents in general, and for XML in particular. Like A(k)-Index and D(k)-Index, our indexes are based on local bisimilarity, however, unlike the previous indexes, they support the removal of non-relevant nodes. We also describe a way to eliminate false drops that might occur due to nodes removal. Our experiments shows that A(k)-Simplified and A(k)-Relevant are much smaller then A(k)-Index, and give accurate results with better performance, for short relevant path queries.

References

  1. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E. and Yergeau, F. (2004, February 2) "Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation", Available: http://www.w3.org/TR/REC-xml.
  2. Busse, R., Carey, M., Florescu, D., Kersten, M., Schmidt, A., Mauolescu, I., and Waas, F. (2001, April) "The XML Benchmark Project", Available: http://monetdb.cwi.nl/xml/index.html.
  3. Buneman, P., Davidson, S.B., Fernandez, M.F., and Suciu, D. (1997) 'Adding Structure to Unstructured Data', Proceedings of ICDT.
  4. Li, Q. and Moon, B. (2001) 'Indexing and Querying XML Data for Regular Path Expressions', Proceedings of VLDB.
  5. Derose, S., Maler, E. and Orchard, D. (2001, June 27) "XML Linking Language (XLink), version 1.0, W3C Recommendation", Available: http://www.w3.org/tr/xlink.
  6. Chamberlin, D., Florescu, D. and Robie, J. (2000) 'Quilt: An XML Query Language for Heterogeneous Data Sources', Proceedings of WebDB.
  7. Abiteboul, S., Quass, D., McHugh, J., Widom, J. and Wiener, J. (1997) 'The Lorel Query Language for Semistructured Data', International Journal on Digital Libraries, 1(1):68-88.
  8. Clark, J. and Derose, S. (1999, November 16) "XML Path Language (XPath) Version 1.0, W3C Recommendation", Available: http://www.w3.org/TR/xpath.
  9. Deutsch, A., Fernandez, M., Florescu, D., Levy, A., and Suciu, D. (1999) 'A Query Language for XML', Proceedings of the Eighth World Wide Web Conference.
  10. Chamberlin, D., Florescu, D., Robie, J., Simeon, J. and Stefanescu, M. (2005, February 11) " XQuery 1.0: An XML Query Language, W3C Working Draft", Available: http://www.w3.org/TR/xquery.
  11. Schenkel, R., Theobald, A. and Weikum, G., (2004) 'HOPI: An Efficient Connection Index for Complex XML Document Collections', Proceedings of EDBT.
  12. Goldman, R. and Widom, J. (1997) 'Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases', Proceedings of VLDB.
  13. Goldman, R. and Widom, J. (1999) 'Approximate DataGuides', Proceedings of the workshop on Query Processing for Semistructured Data and NonStandard Data Formats, Pages 436-445.
  14. Chung, C. W., Min, J. K. and Shim, K. (2002), 'APEX:An Adaptive Path Index for XML Data', Proceedings of SIGMOD.
  15. Milo, T. and Suciu, D. (1999) 'Index Structures for Path Expressions', Proceedings of ICDT.
  16. Kaushik, R., Bohannon, P., Naughton, J.F. and Shenoy, P. (2002) 'Updates for Structure Indexes', Proceedings of VLDB.
  17. Kaushik, R., Bohannon, P., Naughton, J.F. and Korth, H.F. (2002) 'Covering Indexes for Branching Path Queries', Proceedings of ACM SIGMOD.
  18. Kaushik, R., Shenoy, P., Bohannon, P. and Gudes, E. (2002) 'Exploiting Local Similarity for Efficient Indexing of Paths in Graph Structured Data', Proceedings of ICDE.
  19. Qun, C., Lim A. and Ong, K. W. (2003) 'D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data', Proceedings of ACM SIGMOD.
  20. Papakonstantinou, Y., Garcia-Molina, H. and Widom, J. (1995) 'Object Exchange Across Heterogeneous Information Sources', Proceedings of ICDE.
  21. Abiteboul, S. (1997) 'Query Semi-structured Data', Proceedings of ICDT.
  22. McHugh, J., Widom, J., Abiteboul, S., Luo, Q. and Rajamaran, A. (1998) 'Indexing Semistructured Data', Technical Report, Stanford University.
  23. Henzinger, M., Henzinger, T. and Kopke, P. (1995) 'Computing Simulations on Finite and Infinite Graphs', Proceedings of FOCS.
  24. Milo, T. and Suciu, D. (1998) 'Optimizing Regular Path Expressions Using Graph Schemas', Proceedings of ICDE.
  25. Biber, P. and Gudes, E. (2005) 'Improving Algorithms for Indexes in XML based Databases', Master's thesis, The Open University of Israel.
Download


Paper Citation


in Harvard Style

Biber P. and Gudes E. (2005). USING RELEVANT SETS FOR OPTIMIZING XML INDEXES . In Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 972-8865-20-1, pages 13-23. DOI: 10.5220/0001229400130023


in Bibtex Style

@conference{webist05,
author={Paz Biber and Ehud Gudes},
title={USING RELEVANT SETS FOR OPTIMIZING XML INDEXES},
booktitle={Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2005},
pages={13-23},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001229400130023},
isbn={972-8865-20-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - USING RELEVANT SETS FOR OPTIMIZING XML INDEXES
SN - 972-8865-20-1
AU - Biber P.
AU - Gudes E.
PY - 2005
SP - 13
EP - 23
DO - 10.5220/0001229400130023