
facilitate the implementation and execution of graph
algorithms. Notable solutions include PowerGraph
(Gonzalez et al., 2012), GraphX (Gonzalez et al.,
2014), and Gemini (Zhu et al., 2016). Among the
most prominent are Apache Giraph (Apache, 2020)
and Pregel (Malewicz et al., 2010). Pregel, introduced
by Google, is a distributed graph processing frame-
work that has influenced the design of several subse-
quent systems, including Apache Giraph.
These tools are designed to address key challenges
in distributed graph processing: scalability, ease of
use, and fault tolerance (Heidari et al., 2018). Scal-
ability refers to the system’s ability to execute graph
algorithms across an arbitrary number of machines.
Ease of use aims to abstract away the complexity
of distributed computing, allowing developers to fo-
cus primarily on the algorithmic logic rather than
low-level implementation details. Fault tolerance en-
sures that the system can recover from node failures
without losing intermediate computation results. De-
spite these strengths, such systems are often treated as
black boxes: users leverage their functionality with-
out a clear understanding of distributed computing
and its internal mechanisms or execution models.
This paper presents the design and implementa-
tion of a framework to support the distributed pro-
cessing of graph algorithms. The primary objective
is to help developers understand key concepts in dis-
tributed computing and learn to transform sequential
algorithms into their distributed counterparts. This
framework, called Go-Pregel, uses Google’s Pregel as
a reference. It is designed to be algorithm-agnostic
and user-friendly, meaning that it is not tailored to
solve a specific problem but rather provides a general-
purpose platform for implementing and experiment-
ing with a wide range of graph algorithms in a dis-
tributed setting.
The Go-Pregel is developed in Golang, chosen
for its efficiency and robust native support for con-
currency, as well as its growing popularity in re-
cent years. To ensure portability and ease of deploy-
ment, the framework is containerized using Docker,
enabling it to run seamlessly across various environ-
ments. In addition, a simple user interface is provided
to help users visualize and understand the intermedi-
ate steps involved in the execution of distributed algo-
rithms.
The remainder of this paper is structured as fol-
lows. Section 2 provides the core concepts of
Google’s Pregel framework. Section 3 introduces the
proposed Go-Pregel framework, describing its archi-
tecture, implementation, and practical usage. Section
4 concludes the paper by discussing broader implica-
tions and outlining directions for future work.
2 PREGEL OVERVIEW
Pregel is a distributed graph processing framework
(Malewicz et al., 2010). It is designed to provide a
simple and scalable model for writing and executing
distributed graph algorithms. The input to a Pregel
program is a graph, and the output is also a graph.
Both vertices and edges can carry user-defined data,
with the data structure determined arbitrarily by the
user. The output graph produced by a Pregel algo-
rithm is often not the final solution itself but rather
a transformed graph from which the solution can be
more easily extracted.
Pregel interprets each vertex as an independent
machine, and each vertex is responsible for its part of
the computation. The framework uses the BSP model
(Cheatham et al., 1994) to coordinate the work. This
means that the algorithm is divided into several su-
persteps, where each vertex executes a computation
phase, communicates with other vertices, and then
synchronizes with the other vertices. Each vertex is
unable to read or change the values of other vertices,
but they can read and change their own values at will.
Also, each vertex, in the communication phase, can
send messages to any other vertices, as long as the
target vertex’s ID is known. These messages do not
require a reply, and they are only read in the next su-
perstep. The decisions made in the computation phase
are usually based on the messages received in the pre-
vious superstep, since they are the only way of com-
munication between vertices.
During the computation step, each vertex can vote
to halt and consequently become inactive. When ev-
ery vertex in the graph votes to halt, the Pregel algo-
rithm is considered finished, and the resulting graph
is written to the output. When a vertex votes to halt,
it is excluded from the computation phase and stops
working, unless it receives a message from another
vertex. When a halted vertex receives a message, it is
automatically reactivated, and its vote to halt is can-
celed. Deciding when a vertex should vote to halt is
the responsibility of the user, and this decision must
be defined according to the logic of the algorithm be-
ing implemented.
When dealing with Pregel, some methods are ex-
pected to be implemented by the user. The first is the
Compute method, which is called at the beginning of
every superstep. This method encompasses both the
computation and communication phases of the BSP
(Bulk Synchronous Parallel) model. The user has to
define the rules for when and how to send messages
to other vertices, using the method SendMessageTo.
The user also has to define, in the Compute method,
the rules for when a vertex should vote to halt, us-
WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies
150