Cost Optimization on Public Cloud Provider for Big Geospatial Data

Joao Bachiega Junior, Marco Antonio Sousa Reis, Aleteia P. F. de Araujo, Maristela Holanda

2017

Abstract

Big geospatial data is the emerging paradigm for the enormous amount of information made available by the development and widespread use of Geographical Information System (GIS) software. However, this new paradigm presents challenges in data management, which requires tools for large-scale processing, due to the great volumes of data. Spatial Cloud Computing offers facilities to overcome the challenges of a big data environment, providing significant computer power and storage. SpatialHadoop, a fully-fledged MapReduce framework with native support for spatial data, serves as one such tool for large-scale processing.  However, in cloud environments, the high cost of processing and system storage in the providers is a central challenge. To address this challenge, this paper presents a cost-efficient method for processing geospatial data in public cloud providers. The data validation software used was Open Street Map (OSM). Test results show that it can optimize the use of computational resources by up to 263% for available SpatialHadoop datasets.

References

  1. Ahmed, Elmustafa Sayed Ali, and Rashid A. Saeed. "A Survey of Big Data Cloud Computing Security." International Journal of Computer Science and Software Engineering (IJCSSE) 3.1 (2014): 78-85.
  2. Akdogan, Afsin. Cost-efficient partitioning of spatial data on cloud. Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015.
  3. Alarabi, L., Eldawy, A., Alghamdi, R., & Mokbel, M. F. (2014, June). TAREEG: a MapReduce-based web service for extracting spatial data from OpenStreetMap. ACM SIGMOD international conference on Management of data. ACM.
  4. Das, J., Dasgupta, A., Ghosh, S. K., & Buyya, R. A Geospatial Orchestration Framework on Cloud for Processing User Queries. In IEEE International Conference on Cloud Computing for Emerging Markets, 2016.
  5. Distributed System Archicteture. Hadoop cluster size. [Online]. Available from: https://0x0fff.com/hadoopcluster-sizing/ 2016.10.26.
  6. Eldawy, A., Li, Y., Mokbel, M. F., & Janardan, R. (2013, November). CG_Hadoop: computational geometry in MapReduce. The 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.
  7. Eldawy, A., Mokbel, M. F., Alharthi, S., Alzaidy, A., Tarek, K., & Ghani, S. (2015, April). Shahed: A mapreduce-based system for querying and visualizing spatio-temporal satellite data. In 2015 IEEE 31st International Conference on Data Engineering (pp. 1585-1596). IEEE.
  8. Eldawy, Ahmed, and Mohamed F. Mokbel. "A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data." Proceedings of the VLDB Endowment 6.12 (2013): 1230-1233.
  9. Eldawy, Ahmed, and Mohamed F. Mokbel. "Pigeon: A spatial mapreduce language." 2014 IEEE 30th International Conference on Data Engineering.
  10. Eldawy, Ahmed, and Mohamed F. Mokbel. "Spatialhadoop: A mapreduce framework for spatial data." 2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015.
  11. Eldawy, Ahmed, Louai Alarabi, and Mohamed F. Mokbel. "Spatial partitioning techniques in SpatialHadoop." Proceedings of the VLDB Endowment 8.12 (2015).
  12. Eldawy, Ahmed, M. Mokbel, and Christopher Jonathan. "HadoopViz: A MapReduce framework for extensible visualization of big spatial data." IEEE Intl. Conf. on Data Engineering (ICDE). 2016.
  13. Eldawy, Ahmed. "SpatialHadoop: towards flexible and scalable spatial processing using mapreduce." Proceedings of the 2014 SIGMOD PhD symposium. ACM, 2014.
  14. Hadoop Online Tutorial. Formula to calculate NDFS nodes storage. [Online]. Avilable from: http://hadooptutorial.info/ formula-to-calculate-hdfsnodes-storage/ 2016.11.03.
  15. Joshi, Pramila. "Cloud Architecture for Big Data." International Journal of Engineering and Computer Science. 2015.
  16. Krämer, Michel, and Ivo Senner. "A modular software architecture for processing of big geospatial data in the cloud." Computers & Graphics 49 (2015): 69-81.
  17. Leong, L., Petri, G., Gill, B., Dorosh, M. The Gartner Magic Quadrant for Cloud Infrastructure as a Service, Worldwide. [Online]. Available from: https://www.gartner.com/doc/reprints?id=1-2G2O5FC &ct=150519. 2016.11.02.
  18. Mell, Peter, and Tim Grance. "The NIST definition of cloud computing." (2011).
  19. Mokbel, M. F., Alarabi, L., Bao, J., Eldawy, A., Magdy, A., Sarwat, M., ... & Yackel, S. (2014, March). A demonstration of MNTG-A web-based road network traffic generator. In 2014 IEEE 30th International Conference on Data Engineering (pp. 1246-1249). IEEE.
  20. Qu, Chenhao, Rodrigo N. Calheiros, and Rajkumar Buyya. "Auto-scaling Web Applications in Clouds: A Taxonomy and Survey." arXiv preprint arXiv:1609.09224 (2016).
  21. Sagiroglu, Seref, and Duygu Sinanc. "Big data: A review." Collaboration Technologies and Systems (CTS), 2013 International Conference on. IEEE, 2013.
  22. Ward, Jonathan Stuart, and Adam Barker. "Undefined by data: a survey of big data definitions." arXiv preprint arXiv:1309.5821 (2013).
  23. Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., and Fay, D. (2011). Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?. International Journal of Digital Earth, 4(4), 305-329.
  24. Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., & Fay, D. (2011). Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?. International Journal of Digital Earth, 4(4), 305-329.
  25. Zhang, Q., Cheng, L., Boutaba, R. (2010). Cloud computing: state-of-the-art and research challenges. Journal Internet Service Application, 1, 7-8.
Download


Paper Citation


in Harvard Style

Junior J., Sousa Reis M., Araujo A. and Holanda M. (2017). Cost Optimization on Public Cloud Provider for Big Geospatial Data . In Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-243-1, pages 82-90. DOI: 10.5220/0006237800820090


in Bibtex Style

@conference{closer17,
author={Joao Bachiega Junior and Marco Antonio Sousa Reis and Aleteia P. F. de Araujo and Maristela Holanda},
title={Cost Optimization on Public Cloud Provider for Big Geospatial Data},
booktitle={Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2017},
pages={82-90},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006237800820090},
isbn={978-989-758-243-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Cost Optimization on Public Cloud Provider for Big Geospatial Data
SN - 978-989-758-243-1
AU - Junior J.
AU - Sousa Reis M.
AU - Araujo A.
AU - Holanda M.
PY - 2017
SP - 82
EP - 90
DO - 10.5220/0006237800820090