Subgraph Isomorphism Search in Massive Graph Databases

Chemseddine Nabti, Hamida Seba

2016

Abstract

Subgraph isomorphism search is a basic task in querying graph data. It consists to find all embeddings of a query graph in a data graph. It is encountered in many real world applications that require the management of structural data such as bioinformatics and chemistry. However, Subgraph isomorphism search is an NP-complete problem which is prohibitively expensive in both memory and time in massive graph databases. To tackle this problem, we propose a new approach based on concepts widely different from existing works. Our approach relies on a summarized representation of the graph database that minimizes both the amount space required to store data graphs and the processing time of querying them. Experimental results show that our approach performs well compared to the most efficient approach of the literature.

References

  1. Basu, M. and BBA, T. K. H. (2006). Data Complexity in Pattern Recognition. Springer.
  2. Capelle, C., Habib, M., and Montgolfier, F. D. (2002). Graph decompositions and factorizing permutations. Discrete Mathematics & Theoretical Computer Science - DMTCS, 5(1):55-70.
  3. Chen, C., Lin, C. X., Fredrikson, M., Christodorescu, M., Yan, X., and Han, J. (2009). Mining graph patterns efficiently via randomized summaries. Proc. VLDB Endow., 2(1):742-753.
  4. Cheng, J., Ke, Y., Ng, W., and Lu, A. (2007). Fg-index: Towards verification-free query processing on graph databases. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 7807, pages 857-872, New York, NY, USA. ACM.
  5. Cordella, L. P., Foggia, P., Sansone, C., and Vento, M. (2004). A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26:1367- 1372.
  6. Dahlhaus, E., Gustedt, J., and McConnell, R. (1997). Efficient and practical modular decomposition. In eighth annual ACM-SIAM symposium on Discrete algorithms, pages 26-35.
  7. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., and Wu, Y. (2010). Graph pattern matching: From intractable to polynomial time. Proc. VLDB Endow., 3(1-2):264-275.
  8. Fan, W., Li, J., Wang, X., and Wu, Y. (2012). Query preserving graph compression. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 7812, pages 157-168, New York, NY, USA. ACM.
  9. Gallagher, B. (2006). Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS, 6:45-53.
  10. Gallai, T. (1967). Transitiv orientierbare graphen. Acta Mathematica Hungarica, 18:25-66.
  11. Garey, M. R. and Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NPCompleteness.
  12. Habib, M., Montgolfier, F. D., and Paul, C. (2004). A simple linear-time modular decomposition algorithm for graphs. Scandinavian Workshop on Algorithm Theory - SWAT, pages 187-198.
  13. Habib, M. and Paul, C. (2010). A survey of the algorithmic aspects of modular decomposition. Computer Science Review, 4(1):41-59.
  14. Han, W.-S., Lee, J., and Lee, J.-H. (2013). Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 7813, pages 337-348, New York, NY, USA. ACM.
  15. He, H. and Singh, A. (2006). Closure-tree: An index structure for graph queries. In Data Engineering, 2006. ICDE 7806. Proceedings of the 22nd International Conference on, pages 38-38.
  16. He, H. and Singh, A. K. (2008). Graphs-at-a-time: Query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 7808, pages 405-418, New York, NY, USA. ACM.
  17. Khoo, K. G. and Suganthan, P. N. (2001). Multiple relational graphs mapping using genetic algorithms. pages 727-737.
  18. Lagraa, S. and Seba, H. (2016). An efficient exact algorithm for triangle listing in large graphs. Data Mining and Knowledge Discovery, pages 1-20.
  19. Lagraa, S., Seba, H., Khennoufa, R., M'Baya, A., and Kheddouci, H. (2014). A distance measure for large graphs based on prime graphs. Pattern Recognition, 47(9):2993 - 3005.
  20. Lee, J., Han, W.-S., Kasperovics, R., and Lee, J.-H. (2013). An in-depth comparison of subgraph isomorphism algorithms in graph databases. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB'13, pages 133-144. VLDB Endowment.
  21. Micheli, A. (2009). Neural network for graphs : A contextual constructive approach. IEEE Transactions on Neural Networks, 20(3):498-511.
  22. Möhring, R. (1985a). Algorithmic aspect of the substitution decomposition in optimization over relation, set system and boolean function. Ann. Operations Research, 4:195-225.
  23. Möhring, R. (1985b). Algorithmic aspects of comparability graphs and interval graphs. I. Rival. Graphs and Order (D. Reidel), pages 41-101.
  24. Quaddoura, R. and Mansour, K. (2010). Classical graphs decomposition and their totally 2010 decomposable graphs. International Journal of Computer Science and Network Security, 10:1240-1250.
  25. Shang, H., Zhang, Y., Lin, X., and Yu, J. X. (2008). Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow., 1(1):364-375.
  26. Shasha, D., Wang, J. T. L., and Giugno, R. (2002). Algorithmics and applications of tree and graph searching. In Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 7802, pages 39-52, New York, NY, USA. ACM.
  27. Spinrad, J. P. (2003). Efficient Graph Representation. American Mathematical Society.
  28. Sun, Z., Wang, H., Wang, H., Shao, B., and Li, J. (2012). Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788-799.
  29. Tedder, M., Corneil, D. G., Habib, M., and Paul, C. (2008). Simpler linear-time modular decomposition via recursive factorizing permutations. In Automata, Languages and Programming, 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7-11, 2008, Proceedings, Part I: Tack A: Algorithms, Automata, Complexity, and Games, pages 634-645.
  30. Ullmann, J. R. (1976). An Algorithm for Subgraph Isomorphism. J. ACM, 23(1):31-42.
  31. Yan, X., Yu, P. S., and Han, J. (2004). Graph indexing: A frequent structure-based approach. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 7804, pages 335-346, New York, NY, USA. ACM.
  32. Zhang, S., Li, S., and Yang, J. (2009). Gaddi: Distance index based subgraph matching in biological networks. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 7809, pages 192-203, New York, NY, USA. ACM.
  33. Zhao, P. and Han, J. (2010). On graph query optimization in large networks. PVLDB, 3(1):340-351.
  34. Zhao, P., Yu, J. X., and Yu, P. S. (2007). Graph indexing: Tree + delta >= graph. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 7807, pages 938-949. VLDB Endowment.
  35. Zou, L., Chen, L., Yu, J. X., and Lu, Y. (2008). A novel spectral coding in a large graph database. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 7808, pages 181-192, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Nabti C. and Seba H. (2016). Subgraph Isomorphism Search in Massive Graph Databases . In Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD, ISBN 978-989-758-183-0, pages 204-213. DOI: 10.5220/0005875002040213


in Bibtex Style

@conference{iotbd16,
author={Chemseddine Nabti and Hamida Seba},
title={Subgraph Isomorphism Search in Massive Graph Databases},
booktitle={Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,},
year={2016},
pages={204-213},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005875002040213},
isbn={978-989-758-183-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,
TI - Subgraph Isomorphism Search in Massive Graph Databases
SN - 978-989-758-183-0
AU - Nabti C.
AU - Seba H.
PY - 2016
SP - 204
EP - 213
DO - 10.5220/0005875002040213