A Proposal to Maintain the Semantic Balance in Cluster-based Data Integration Systems

Edemberg Rocha Silva, Bernadette Farias Lóscio, Ana Carolina Salgado

2014

Abstract

With the large volume of data sources on the Web, we need a system that integrates them, so that the user can query them transparently. For efficiency in queries, integration systems can group these sources in clusters according to the semantic similarity of their schemas. However, the sources have autonomy to evolve their schema, and to join or to leave the integration system at any time. This autonomy may cause a problem which we define as semantic unbalance of clusters. The semantic unbalance can compromise the formation of clusters and hence the efficiency of the submitted queries. In this paper, we propose a solution to the semantic balance of clusters in dynamic data integration systems based on self-organization. We also introduce a measure to evaluate how much the clusters are semantically unbalanced

References

  1. Ayyasamy, S., and Sivanandam, S., 2010. A Cluster Based Replication Architecture for Load Balancing in Peerto-Peer Content Distribution. International Journal of Computer Networks & Communications (IJCNC), vol.2, pp. 158-172.
  2. Conforti, G., Ghelli, G., Manghi, P., and Sartiani, C., 2004. A Self-organizing XML P2P Database System. Proceedings of the 2004 international conference on Current Trends in Database Technology, pp. 456-465.
  3. Curino, C., Moon, H. J., D., Alin, and Zaniolo, C., 2013. Automating the database schema evolution process. Published in The VLDB Journal - The International Journal on Very Large Data Bases, vol. 22, pp. 73-98.
  4. Genevès, P., Layaïda, N., and Quint, V., 2011. Impact of XML Schema Evolution. Published in Journal ACM Transactions on Internet Technology (TOIT), vol. 11, article 4.
  5. Halevy, A., Rajarama, A., and Ordille, J., 2006. Data Integration: The Teenage Years. Proceedings of the 32nd International Conference on Very large data bases, pp 9-16. Seoul, Korea.
  6. Halevy, A., Sarma, A. D., and Dong, X., 2008. Bootstrapping pay-as-you-go data integration systems. Proceeding of the 2008 ACM SIGMOD International Conference of Data, pp. 861-874. Vancouver, Canada.
  7. Joung, Y., and Chuang, F., 2009. OntoZilla: An ontologybased, semi-structured, and evolutionary peer-to-peer network for information systems and services. Journal of Future Generation Computer Systems, vol. 25, n° 1, pp. 53-63.
  8. Kantere, V., Tsoumakos, D., and Sellis, T. ,2008. A framework for semantic grouping in P2P databases. Published in Journal Information Systems, vol. 33, pp. 611-636.
  9. Montanelli, S., Bianchini, D., Aiello, C., Baldoni, R., Bolchini, C., Bonomi, S., Castano, S., Catarci, T., Antonellis, V., Ferrara, A., Melchiori, M., Quintarelli, E., Scannapieco, M., Schreiber, A., and Tanca, L.,2011. The ESTEEM platform: enabling P2P semantic collaboration through emerging collective knowledge. Published in Journal of Intelligent Information Systems, vol. 36, n° 2.
  10. Pires, C. E., Santiago, R., Kedad, Z., Bouzehoub, M. and Salgado, A. C., 2012. Ontology-based Clustering in a Peer Data Management System. Published in International Journal of Distributed Systems and Technologies (IJDST), vol. 3, Issue 2, pp. 1-21.
  11. Raftopoulou, P., and Petrakis, E. G. M., 2008. A Measure for Cluster Cohesion in Semantic Overlay Semantic. Proceedings of the 2008 ACM workshop on LargeScale distributed systems for information retrieval. Napa Valley, USA.
  12. Rijsbergen, C. J., 1979. Information Retrieval, 2nd Edition, MA: Butterworths.
  13. Roth, A., and Skritek, S., 2013. Peer Data Management. In Data Exchange, Information and Streams, vol. 5, pp. 185-215.
  14. Silva, E. R., Salgado, A. C., 2013. Load Balance for Semantic Cluster-based Data Integration Systems. Proceeding of the 17th International Database Engineering & Applications Symposium (IDEAS'13). Barcelona, Spain.
  15. Sockut, G. H., and Iyer, B. R, 2011. Online Reorganization of Databases. Published in Journal ACM Computing Surveys (CSUR), vol. 41, article 14.
  16. Terwilliger, J. F., Bernstein, P. A., and Unnitha, A., 2010. “Worry-Free Database Upgrades: Automated ModelDriven Evolution of Schemas and Complex Mappings”. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1191-1194. Indianapolis, USA.
  17. Tian, Y., Song, B., and Huh, E. N.. “Dynamic contentbased cloud data integration system with privacy and cost concern”. Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp. 193-199. Redmond, USA. 2011.
  18. Wall, B., and Angryk, R., 2011. Minimal Data Sets vs. Synchronized Data Copies in a Schema and Data Versioning System. Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge Management, pp. 67-74. Glasgow, United Kingdom.
  19. W3C. “OWL - Web Ontology Language”, 2013. Available in http://www.w3.org/TR/owl-features. Accessed on October 1st..
  20. Zamboulis, L., Martin, N., and Poulovassillis, A., 2010. Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources. Published in Journal Future Generation Computer Science Systems, vol. 26, pp. 1073-1091.
Download


Paper Citation


in Harvard Style

Rocha Silva E., Farias Lóscio B. and Carolina Salgado A. (2014). A Proposal to Maintain the Semantic Balance in Cluster-based Data Integration Systems . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-027-7, pages 90-98. DOI: 10.5220/0004897000900098


in Bibtex Style

@conference{iceis14,
author={Edemberg Rocha Silva and Bernadette Farias Lóscio and Ana Carolina Salgado},
title={A Proposal to Maintain the Semantic Balance in Cluster-based Data Integration Systems},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2014},
pages={90-98},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004897000900098},
isbn={978-989-758-027-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Proposal to Maintain the Semantic Balance in Cluster-based Data Integration Systems
SN - 978-989-758-027-7
AU - Rocha Silva E.
AU - Farias Lóscio B.
AU - Carolina Salgado A.
PY - 2014
SP - 90
EP - 98
DO - 10.5220/0004897000900098