A Hadoop Open Source Backup Solution

Heitor Faria, Rodrigo Hagstrom, Marco Reis, Breno G. S. Costa, Edward Ribeiro, Maristela Holanda, Priscila Solis Barreto, Aletéia P. F. Araújo

2018

Abstract

Backup is a traditional and critical business service with increasing challenges, such as the snowballing of constantly increasing data. Distributed data-intensive applications, such as Hadoop, can give a false impression that they do not need backup data replicas, but most researchers agree this is still necessary for the majority of its components. A brief survey reveals several disasters that can cause data loss in Hadoop HDFS clusters, and previous studies propose having an entire second Hadoop cluster to host a backup replica. However, this method is much more expensive than using traditional backup software and media, such a tape library, a Network Attached Storage (NAS) or even a Cloud Object Storage. To address these problems, this paper introduces a cheaper and faster Hadoop backup and restore solution. It compares the traditional redundant cluster replica technique with an alternative one that consists of using Hadoop client commands to create multiple streams of data from HDFS files to Bacula – the most popular open source backup software and that can receive information from named pipes (FIFO). The new mechanism is roughly 51% faster and consumed 75% less backup storage when compared with the previous solutions.

Download


Paper Citation


in Harvard Style

Faria H., Hagstrom R., Reis M., G. S. Costa B., Ribeiro E., Holanda M., Barreto P. and Araújo A. (2018). A Hadoop Open Source Backup Solution.In Proceedings of the 8th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-295-0, pages 651-657. DOI: 10.5220/0006809206510657


in Bibtex Style

@conference{closer18,
author={Heitor Faria and Rodrigo Hagstrom and Marco Reis and Breno G. S. Costa and Edward Ribeiro and Maristela Holanda and Priscila Solis Barreto and Aletéia P. F. Araújo},
title={A Hadoop Open Source Backup Solution},
booktitle={Proceedings of the 8th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2018},
pages={651-657},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006809206510657},
isbn={978-989-758-295-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 8th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - A Hadoop Open Source Backup Solution
SN - 978-989-758-295-0
AU - Faria H.
AU - Hagstrom R.
AU - Reis M.
AU - G. S. Costa B.
AU - Ribeiro E.
AU - Holanda M.
AU - Barreto P.
AU - Araújo A.
PY - 2018
SP - 651
EP - 657
DO - 10.5220/0006809206510657