loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Dominique Thiébaut ; Yang Li ; Diana Jaunzeikare ; Alexandra Cheng ; Ellysha Raelen Recto ; Gillian Riggs ; Xia Ting Zhao ; Tonje Stolpestad and Cam Le T. Nguyen

Affiliation: Smith College, United States

ISBN: 978-989-8425-52-2

Keyword(s): Grid computing, XGrid, Hadoop, Wikipedia, Data mining, Performance.

Related Ontology Subjects/Areas/Topics: Cloud Applications Performance and Monitoring ; Cloud Computing ; Platforms and Applications

Abstract: We present a simple comparison of the performance of three different cluster platforms: Apple’s XGrid, and Hadoop the open-source version of Google’s MapReduce as the total execution time taken by each to parse a 27-GByte XML dump of the English Wikipedia. A local hadoop cluster of Linux workstation, as well as an Elastic MapReduce cluster rented from Amazon are used. We show that for this specific workload, XGrid yields the fastest execution time, with the local Hadoop cluster a close second. The overhead of fetching data from Amazon’s Simple Storage System (S3), along with the inability to skip the reduce, sort, and merge phases on Amazon penalizes this platform targeted for much larger data sets.

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.228.21.186

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Thiébaut, D.; Li, Y.; Jaunzeikare, D.; Cheng, A.; Raelen Recto, E.; Riggs, G.; Ting Zhao, X.; Stolpestad, T. and Le T. Nguyen, C. (2011). PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches.In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 391-396. DOI: 10.5220/0003385603910396

@conference{closer11,
author={Dominique Thiébaut. and Yang Li. and Diana Jaunzeikare. and Alexandra Cheng. and Ellysha Raelen Recto. and Gillian Riggs. and Xia Ting Zhao. and Tonje Stolpestad. and Cam Le T. Nguyen.},
title={PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={391-396},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003385603910396},
isbn={978-989-8425-52-2},
}

TY - CONF

JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches
SN - 978-989-8425-52-2
AU - Thiébaut, D.
AU - Li, Y.
AU - Jaunzeikare, D.
AU - Cheng, A.
AU - Raelen Recto, E.
AU - Riggs, G.
AU - Ting Zhao, X.
AU - Stolpestad, T.
AU - Le T. Nguyen, C.
PY - 2011
SP - 391
EP - 396
DO - 10.5220/0003385603910396

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.