A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts

Marco Cavallo, Giuseppe Di Modica, Carmelo Polito, Orazio Tomarchio

2017

Abstract

The wide spread adoption of IoT technologies has resulted in generation of huge amount of data, or Big Data, which has to be collected, stored and processed through new techniques to produce value in the best possible way. Distributed computing frameworks such as Hadoop, based on the MapReduce paradigm, have been used to process such amounts of data by exploiting the computing power of many cluster nodes. Unfortunately, in many real big data applications the data to be processed reside in various computationally heterogeneous data centers distributed in different locations. In this context the Hadoop performance collapses dramatically. To face this issue, we developed a Hierarchical Hadoop Framework (H2F) capable of scheduling and distributing tasks among geographically distant clusters in a way that minimizes the overall jobs execution time. In this work the focus is put on the definition of a job scheduling system based on a one-point iterative search algorithm that increases the framework scalability while guaranteeing good job performance.

References

  1. Burke, E. K. and Bykov, Y. (2008). A late acceptance strategy in hill-climbing for examination timetabling problems. In Proceedings of the conference on the Practice and Theory of Automated Timetabling(PATAT).
  2. Cavallo, M., Cusmà, L., Di Modica, G., Polito, C., and Tomarchio, O. (2015). A Scheduling Strategy to Run Hadoop Jobs on Geodistributed Data. In Advances in Service-Oriented and Cloud Computing: Workshops of ESOCC 2015, Taormina, Italy, September 15-17, 2015, Revised Selected Papers, volume 567 of CCIS, pages 5-19. Springer.
  3. Cavallo, M., Cusmà, L., Di Modica, G., Polito, C., and Tomarchio, O. (2016a). A Hadoop based Framework to Process Geo-distributed Big Data. In Proceedings of the 6th International Conference on Cloud Computing and Services Science (CLOSER 2016), pages 178-185, Rome (Italy).
  4. Cavallo, M., Di Modica, G., Polito, C., and Tomarchio, O. (2016b). Application Profiling in Hierarchical Hadoop for Geo-distributed Computing Environments. In IEEE Symposium on Computers and Communications (ISCC 2016), Messina (Italy).
  5. Hajek, B. (1988). Cooling schedules for optimal annealing. Mathematics of Operations Research, 13(2):311-329.
  6. Heintz, B., Chandra, A., Sitaraman, R., and Weissman, J. (2014). End-to-end Optimization for Geo-Distributed MapReduce. IEEE Transactions on Cloud Computing, 4(3):293-306.
  7. Jayalath, C., Stephen, J., and Eugster, P. (2014). From the Cloud to the Atmosphere: Running MapReduce across Data Centers. IEEE Transactions on Computers, 63(1):74-87.
  8. Kim, S., Won, J., Han, H., Eom, H., and Yeom, H. Y. (2011). Improving Hadoop Performance in Intercloud Environments. SIGMETRICS Perform. Eval. Rev., 39(3):107-109.
  9. Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., and Li, W. W. (2011). A Hierarchical Framework for Cross-domain MapReduce Execution. In Proceedings of the Second International Workshop on Emerging Computational Methods for the Life Sciences, ECMLS 7811, pages 15- 22.
  10. Mattess, M., Calheiros, R. N., and Buyya, R. (2013). Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines. In Proceedings of the 2013 IEEE 27th International Conference on Advanced Information Networking and Applications, AINA 7813, pages 629-636.
  11. Miorandi, D., Sicari, S., Pellegrini, F. D., and Chlamtac, I. (2012). Internet of things: Vision, applications and research challenges. Ad Hoc Networks, 10(7):1497 - 1516.
  12. Yang, H., Dasdan, A., Hsiao, R., and Parker, D. S. (2007). Map-reduce-merge: Simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 7807, pages 1029-1040.
  13. Zhang, Q., Liu, L., Lee, K., Zhou, Y., Singh, A., Mandagere, N., Gopisetty, S., and Alatorre, G. (2014). Improving Hadoop Service Provisioning in a Geographically Distributed Cloud. In Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on, pages 432-439.
Download


Paper Citation


in Harvard Style

Cavallo M., Di Modica G., Polito C. and Tomarchio O. (2017). A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts . In Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-245-5, pages 92-101. DOI: 10.5220/0006307100920101


in Bibtex Style

@conference{iotbds17,
author={Marco Cavallo and Giuseppe Di Modica and Carmelo Polito and Orazio Tomarchio},
title={A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts},
booktitle={Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2017},
pages={92-101},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006307100920101},
isbn={978-989-758-245-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts
SN - 978-989-758-245-5
AU - Cavallo M.
AU - Di Modica G.
AU - Polito C.
AU - Tomarchio O.
PY - 2017
SP - 92
EP - 101
DO - 10.5220/0006307100920101