Table 2: Performances of different computing methods in dataset2
Framework Computing Method Execution Time Space overload
Map
Parallel 15s676ms 1744. 05MB
Serial 3min3s860ms 106. 25MB
Reduce
Parallel 6s241ms 1573. 60MB
Serial 1min40s903ms 51. 94MB
4 DISCUSSION
This study expatiates a frame of MapReduce based on
a serverless platform and uses it to count word
occurrences in 2 datasets. Taking advantage of cloud
computing and cloud storage, the study addresses the
problem that MapReduce is a local operation instead
of an online operation, allowing MapReduce to
process online data and improving its real-time
capability.
Besides, this study compares performances of
parallel computing and serial computing based on the
MapReduce framework, successfully confirming that
parallel computing contributes to carrying out
computing tasks that are massive and complex.
The shortcomings are as follows. First, the
number of objects in a bucket is far more than that one
line of code can extract (1000). Therefore, in the
serial computing task, repeated traversal are made,
bringing extra time and space overload. Second, Grag
(2021) proposes that to use MapReduce,
programmers’ manual work to divide an algorithm
into Maps and Reductions is unavoidable. This
problem has not been addressed in this study. Third,
Grag (2021) also notes that MapReduce is weak in
processing large sets of graphs. In future works, we
will try to optimize the framework of Pregel proposed
by Google.
5 CONCLUSIONS
This study successfully implements the word count
application of MapReduce based on a serverless
platform. Besides, the efficiencies of parallel
computing and serial computing are compared and
analyzed in this study. The result indicates that
MapReduce based on serverless computing can
evidently cut down time overloads. Parallel trades off
space for time, performing efficiently in data
processing.
However, the study also shows that MapReduce
does not do well when taking convenience into
account. To implement a set of MapReduce jobs,
complicated scripts are always unavoidable. Thus,
while MapReduce remains a powerful tool for big
data processing, its adoption may be hindered by the
complexity involved in setting up and managing
MapReduce jobs, suggesting a need for further
research into simplifying these processes or exploring
alternative technologies that offer a better balance
between performance and convenience.
REFERENCES
Adel, N.T., Javadi, B., Iosup, A., Smirni, E. and Dustdar,
S., 2025. Serverless computing for next-generation
application development. Future Generation Computer
Systems, 107573-107573.
Che, Y., 2022. Serverless computing. Computer and
Network, 1, pp. 36–37.
Garg, U., 2021. Data analytic models that redress the
limitations of MapReduce. International Journal of
Web-Based Learning and Teaching Technologies
(IJWLTT), 6, pp. 1–15.
He, B., 2021. Big data computation analysis based on
MapReduce. Computer Programming Techniques and
Maintenance, 12, pp. 97–100.
Jiang, Q., 2021. Distributed cloud computing data mining
methods based on MapReduce. Journal of Jingdezhen
College, 6, pp. 106–108+128.
Puliafito, C., Rana, O., Bittencourt, F.L., et al., 2024.
Serverless computing in the cloud-to-edge continuum.
Future Generation Computer Systems, 161514-517.
Minocha, S. and Singh, H., 2016. MapReduce technique:
Review and SWOT analysis. International Journal of
Engineering Research, 6, pp. 531–533.
Shukla, D. and Alim, A., 2018. A review on big data:
Views, categories and aspects. International Journal of
Computer Applications, 18, pp. 34–42.
Tang, J., Du, W. and Zhou, Y., 2024. Application of the
MapReduce model in large-scale data parallel mining.
Intelligent IoT Technology, 2, pp. 38–42.
Yang, B., Zhao, S. and Liu, F., 2022. Research on serverless
computing technology: A review. Computer
Engineering and Science, 4, pp. 611–619.
Zinuo, C., Zebin, C., Xinglei, C., Ruhui, M., Haibing, G.
and Rajkumar, B., 2024. SPSC: Stream processing
framework atop serverless computing for industrial big
data. IEEE Transactions on Cybernetics.