Distributed Processing of Elevation Data by Means of Apache Hadoop in a Small Cluster

Jitka Komarkova, Jakub Spidlen, Devanjan Bhattacharya, Oldrich Horak

2013

Abstract

Geoinformation technologies require fast processing of high and quickly increasing volumes of all types of spatial data. Parallel computational approach and distributed systems represent technologies which are able to provide required services, with reasonable costs. MapReduce is one example of such approach. It has been successfully implemented in large clusters in several instances. The applications include spatial and imagery data processing. The contribution deals with its implementation and operational performance using only a very small cluster (consisting of a few commodity personal computers) to process large-volume spatial data. Open-source implementation of MapReduce, named, Apache Hadoop, is used. The contribution is focused on a low-price solution and it deals with speed of processing and distribution of processed files. Authors run several experiments to evaluate the benefit of distributed data processing in a small-sized cluster and to find possible limitations. Size of processed files and number of processed values is used as the most important criteria for performance evaluation. Point elevation data were used during the experiments.

References

  1. Apache Software Foundation, Welcome to Apache™ Hadoop®! (online), 2013. [cit. 2013-06-04]. URL: < http://hadoop.apache.org/index.html>.
  2. Barroso, L. A., Dean, J., Holzle, U., 2003. Web search for a planet: The Google cluster architecture, IEEE MICRO, 23 (2), 22-28.
  3. Cardosa, M. et al., 2012. Exploiting Spatio-Temporal Tradeoffs for Energy-Aware MapReduce in the Cloud, IEEE Transactions on Computers, 31 (12), 1737-1751.
  4. Cary, A. et al., 2009. Experiences on Processing Spatial Data with MapReduce, In Scientific and Statistical Database Management, Proceedings, Lecture Notes in Computer Science, vol. 5566, 302-319. SpringerVerlag.
  5. Chu, S.-T., Yeh, C.-C., Huang, C.-L., 2009. A CloudBased Trajectory Index Scheme, In ICEBE 2009: IEEE International Conference on E-Business Engineering, Proceedings, 602-607. IEEE.
  6. Dean, J., Ghemawat, S., 2004. Map Reduce: Simplified data processing on large clusters, In OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, 137-149.
  7. Hovad, J. et al., 2012. Data Processing and Visualisation of LIDAR Point Clouds. In Proceedings of the 3rd International conference on Applied Informatics and Computing Theory (AICT 7812), 178-183. WSEAS Press.
  8. Stein, J., 2010. Tips, Tricks And Pointers When Setting Up Your First Hadoop Cluster To Run Map Reduce Jobs (online). URL: <http://allthingshadoop.com/ 2010/04/28/map-reduce-tips-tricks-your-first-realcluster/>.
  9. Xuhui L. et al., 2009. Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. In Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31 - September 4, 2009, New Orleans, Louisiana, USA, 1-8. IEEE.
  10. Zhang C. et al., 2010. Case Study of Scientific Data Processing on a Cloud Using Hadoop. In High Performance Computing Systems and Applications, Lecture Notes in Computer Science, 5976, 400-415. Springer-Verlag.
  11. Zhou, L. L., Wang, R. J., Cui, C.Y., 2012. GIS Application Model Based on Cloud Computing, Communications in Computer and Information Science: Network Computing and Information Security, 345, 130-136.
  12. Zhu, S. et al., 2009. Evaluating SPLASH-2 Applications Using MapReduce, Advanced Parallel Processing Technologies, Proceedings, Lecture Notes in Computer Science, 5737, 452-464. Springer-Verlag.
Download


Paper Citation


in Harvard Style

Komarkova J., Spidlen J., Bhattacharya D. and Horak O. (2013). Distributed Processing of Elevation Data by Means of Apache Hadoop in a Small Cluster . In Proceedings of the 8th International Joint Conference on Software Technologies - Volume 1: ICSOFT-EA, (ICSOFT 2013) ISBN 978-989-8565-68-6, pages 340-344. DOI: 10.5220/0004591303400344


in Bibtex Style

@conference{icsoft-ea13,
author={Jitka Komarkova and Jakub Spidlen and Devanjan Bhattacharya and Oldrich Horak},
title={Distributed Processing of Elevation Data by Means of Apache Hadoop in a Small Cluster},
booktitle={Proceedings of the 8th International Joint Conference on Software Technologies - Volume 1: ICSOFT-EA, (ICSOFT 2013)},
year={2013},
pages={340-344},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004591303400344},
isbn={978-989-8565-68-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Software Technologies - Volume 1: ICSOFT-EA, (ICSOFT 2013)
TI - Distributed Processing of Elevation Data by Means of Apache Hadoop in a Small Cluster
SN - 978-989-8565-68-6
AU - Komarkova J.
AU - Spidlen J.
AU - Bhattacharya D.
AU - Horak O.
PY - 2013
SP - 340
EP - 344
DO - 10.5220/0004591303400344