EME: An Automated, Elastic and Efficient Prototype for Provisioning Hadoop Clusters On-demand

Feras M. Awaysheh, Tomás F. Pena, José C. Cabaleiro

2017

Abstract

Aiming at enhancing the MapReduce-based applications Quality of Service (QoS), many frameworks suggest a scale-out approach, statically adding new nodes to the cluster. Such frameworks are still expensive to acquire and does not consider the optimal usage of available resources in a dynamic manner. This paper introduces a prototype to address with this issue, by extending MapReduce resource manager with dynamic provisioning and low-cost resources capacity uplift on-demand. We propose an Enhanced Mapreduce Environment (EME), to support heterogeneous environments by extending Apache Hadoop to an opportunistically containerized environment, which enhances system throughput by adding underused resources to a local or cloud based cluster. The main architectural elements of this framework are presented, as well as the requirements, challenges, and opportunities of a first prototype.

References

  1. Ananthanarayanan, G., Douglas, C., Ramakrishnan, R., Rao, S., and Stoica, I. (2012). True elasticity in multitenant data-intensive compute clusters. In Proc. 3rd ACM Symposium on Cloud Computing, page 24.
  2. Anderson, D. P. (2004). Boinc: A system for publicresource computing and storage. In Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on, pages 4-10. IEEE.
  3. Anjos, J. C., Carrera, I., Kolberg, W., Tibola, A. L., Arantes, L. B., and Geyer, C. R. (2015). MRA++: Scheduling and data placement on MapReduce for heterogeneous environments. Future Generation Computer Systems, 42:22-35.
  4. Apache Software (2017). http://myriad.apache.org/. 30.
  5. Chen, K., Powers, J., Guo, S., and Tian, F. (2014). Cresp: Towards optimal resource provisioning for MapReduce computing in public clouds. IEEE Trans. on Par. and Distributed Systems, 25(6):1403-1412.
  6. Chen, L., Huo, X., and Agrawal, G. (2012). Accelerating MapReduce on a coupled CPU-GPU architecture. In Proc. of the Int. Conf. on High Performance Computing, Networking, Storage and Analysis, page 25. IEEE Computer Society Press.
  7. Conti, M., Giordano, S., May, M., and Passarella, A. (2010). From opportunistic networks to opportunistic computing. IEEE Communications Magazine, 48(9).
  8. Dahiphale, D., Karve, R., Vasilakos, A. V., Liu, H., Yu, Z., Chhajer, A., Wang, J., and Wang, C. (2014). An advanced MapReduce: cloud MapReduce, enhancements and applications. IEEE Transactions on Network and Service Management, 11(1):101-115.
  9. Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107-113.
  10. Durrani, M. N. and Shamsi, J. A. (2014). Volunteer computing: requirements, challenges, and solutions. Journal of Network and Computer Applications, 39:369-380.
  11. Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., and Khan, S. U. (2015). The rise of Big Data on cloud computing: Review and open research issues. Information Systems, 47:98-115.
  12. Herodotou, H., Dong, F., and Babu, S. (2011). No one (cluster) size fits all: automatic cluster sizing for dataintensive analytics. In Proceedings of the 2nd ACM Symposium on Cloud Computing, page 18. ACM.
  13. Honjo, T. and Oikawa, K. (2013). Hardware acceleration of Hadoop MapReduce. In Big Data, 2013 IEEE International Conference on, pages 118-124. IEEE.
  14. Ji, Y., Tong, L., He, T., Tan, J., Lee, K.-w., and Zhang, L. (2013). Improving multi-job MapReduce scheduling in an opportunistic environment. In Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on, pages 9-16. IEEE.
  15. Jin, H., Yang, X., Sun, X.-H., and Raicu, I. (2012). Adapt: Availability-aware MapReduce data placement for non-dedicated distributed computing. In Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on, pages 516-525. IEEE.
  16. Kurochkin, I. and Saevskiy, A. (2016). Boinc forks, issues and directions of development. Procedia Computer Science, 101:369-378.
  17. Lin, H., Ma, X., Archuleta, J., Feng, W.-c., Gardner, M., and Zhang, Z. (2010). Moon: MapReduce on opportunistic environments. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 95-106. ACM.
  18. Moca, M., Silaghi, G. C., and Fedak, G. (2011). Distributed results checking for MapReduce in volunteer computing. In Parallel and distributed processing workshops and Phd Forum (IPDPSW), 2011 IEEE international symposium on, pages 1847-1854. IEEE.
  19. Nghiem, P. P. and Figueira, S. M. (2016). Towards efficient resource provisioning in MapReduce. Journal of Parallel and Distributed Computing, 95:29-41.
  20. Thain, D., Tannenbaum, T., and Livny, M. (2005). Distributed computing in practice: the Condor experience. Concurrency and computation: practice and experience, 17(2-4):323-356.
  21. Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., and Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments. In Osdi, volume 8, page 7.
Download


Paper Citation


in Harvard Style

Awaysheh F., Pena T. and Cabaleiro J. (2017). EME: An Automated, Elastic and Efficient Prototype for Provisioning Hadoop Clusters On-demand . In Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-243-1, pages 737-742. DOI: 10.5220/0006379607370742


in Bibtex Style

@conference{closer17,
author={Feras M. Awaysheh and Tomás F. Pena and José C. Cabaleiro},
title={EME: An Automated, Elastic and Efficient Prototype for Provisioning Hadoop Clusters On-demand},
booktitle={Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2017},
pages={737-742},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006379607370742},
isbn={978-989-758-243-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - EME: An Automated, Elastic and Efficient Prototype for Provisioning Hadoop Clusters On-demand
SN - 978-989-758-243-1
AU - Awaysheh F.
AU - Pena T.
AU - Cabaleiro J.
PY - 2017
SP - 737
EP - 742
DO - 10.5220/0006379607370742