SUSTAINABILITY OF HADOOP CLUSTERS

Luis Bautista, Alain April

2011

Abstract

Hadoop is a set of utilities and frameworks for the development and storage of distributed applications in cloud computing, the core component of which is the Hadoop Distributed File System (HDFS). NameNode is a key element of its architecture, and also its “single point of failure”. To address this issue, we propose a replication mechanism that will protect the NameNode data in case of failure. The proposed solution involves two distinct components: the creation of a BackupNode cluster that will use a leader election function to replace the NameNode, and a mechanism to replicate and synchronize the file system namespace that is used as a recovery point.

References

  1. Apache Hadoop, 2010. http://hadoop.apache.org/
  2. Apache Software Foundation, 2008. Streaming Edits to a Backup Node, https://issues.apache.org/jira/browse /HADOOP-4539 .
  3. Apache Software Foundation, 2008. ZooKeeper Overview http://hadoop.apache.org/zookeeper/docs/current/zook eeperOver.html
  4. Apache Software Foundation, 2010. BooKeeper Overview.http://hadoop.apache.org/zookeeper/docs/ r3.3.0/bookkeeperOverview.html
  5. Carolan, G., 2009. Introduction to Cloud Computing Architecture. Sun Microsystems.
  6. Dhruba, B., 2008. Hadooop Distributed File System Architecture.
  7. Jin, H., Ibrahim, S., Bell, T., Qi, L., Cao, H., Wu, S., and Shi, X. (2010) Tools and Technologies for Building Clouds, Cloud Computing: Principles, Systems and Applications, Computer Communications and Networks, Springer-Verlag.
  8. Red, B., Junqueira, F. P., 2008. A Simple Totally Ordered Broadcast Protocol. In proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS), Yorktown Heights, New York, September 15 - 17, vol. 341:2008).
  9. White, T., 2009. Hadoop: The Definitive Guide, OReilly Media, Inc.
  10. Yahoo! Inc, 2010. Managing a Hadoop Cluster, http://developer.yahoo.com/hadoop/tutorial/module7.h tml#configs .
Download


Paper Citation


in Harvard Style

Bautista L. and April A. (2011). SUSTAINABILITY OF HADOOP CLUSTERS . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 587-590. DOI: 10.5220/0003332705870590


in Bibtex Style

@conference{closer11,
author={Luis Bautista and Alain April},
title={SUSTAINABILITY OF HADOOP CLUSTERS},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={587-590},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003332705870590},
isbn={978-989-8425-52-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - SUSTAINABILITY OF HADOOP CLUSTERS
SN - 978-989-8425-52-2
AU - Bautista L.
AU - April A.
PY - 2011
SP - 587
EP - 590
DO - 10.5220/0003332705870590