Resource Allotment Utilizing Multi-Armed Bandit Fostered

Reinforcement Learning in Mobile Edge Computing Ecosystems

Priyabrata Nayak

1 a

, Dipti Dash

1 b

, Sapthak Mohajon Turjya

1 c

and Anjan Bandyopadhyay

2 d

Kainga Institute of Industrial Technology University, Bhubaneswar, 751024, India

Department of Computing Science Engineering, India

Keywords:

Mobile Edge Computing, Multi-Armed Bandit, Reinforcement Learning, Resource Allotment, Upper-

Conﬁdence Bound, Social Welfare.

Abstract:

Mobile Edge Computing (MEC) leverages the nearness of computational elements to end-users in wireless

networks, pledging low latency and elevated throughput for emerging mobile utilities. Nevertheless, efﬁcient

resource allotment remains a notable challenge in MEC ecosystems due to the varying and heterogeneous char-

acter of mobile networks. Classic static resource allotment strategies often fail to adjust to varying network

conditions, showing suboptimal allotments. In this paper, we propose a novel strategy for resource allotment

in MEC ecosystems utilizing a Multi-Armed Bandit (MAB) based Reinforcement Learning (RL) approach.

By viewing the resource allotment problem as an MAB problem, our strategy enables the dynamic assignment

of resources founded on real-time feedback, thereby enhancing resource usage and user satisfaction. We offer

a comprehensive evaluation of our technique through simulations in diverse MEC scenarios, which includes

a comprehensive comparison with the round robin task scheduling algorithm to represent the efﬁcacy of our

proposed methodology, exhibiting its efﬁcacy in acclimating to changing network conditions and surpass-

ing traditional static allocation procedures. Thereby, our results showcase the prospect of MAB-based RL

strategies in improving resource administration in MEC ecosystems, curving the path for better adaptive and

productive mobile edge computing applications.

1 INTRODUCTION

Mobile Edge Computing (MEC) has appeared as

a revolutionary paradigm in wireless communica-

tion systems, striving to provide low-latency, high-

throughput services to mobile users by integrating

computational elements at the network edge (Wang

et al., 2023b). With the expansion of latency-prone

utilities like augmented reality, autonomous automo-

biles, and Internet of Things (IoT) gadgets, the need

for efﬁcient resource allotment in MEC ecosystems

has evolved paramount. Nevertheless, conventional

static allotment strategies often fail to dynamically

adjust to the evolving network requirements and user

needs, leading to suboptimal resource usage and tar-

nished quality of service (QoS).

Considering an augmented reality (AR) (Ateya

https://orcid.org/0009-0002-9069-3420

https://orcid.org/0000-0002-4625-2727

https://orcid.org/0009-0004-5190-0305

https://orcid.org/0000-0001-7670-2269

et al., 2023) utility where users interact with virtual

entities superimposed onto the real-world ecosystem

in real time. In such a scenario, the latency/delay

between user communication and the processing of

virtual entities performs a crucial role in delivering

a ﬂawless and immersive venture. Static allotment

of computational elements may lead to erratic per-

formance, as the need for resources varies dynami-

cally with user mobility and utility needs (Gong et al.,

2024). Similarly, in the context of automobiles, which

laboriously depend on real-time data processing for

direction and decision-making, productive resource

allotment evolves critically(Ray and Banerjee, 2024).

Considering a scenario, during the time of day when

the trafﬁc is at its peak, the processing requirement

for the data generated by various sensors functioning

with complicated algorithms changes with the trafﬁc

variance of different regions. Resource allotment with

static allotment procedures will result in misutiliza-

tion of resources in areas where crowd accumulations

are minimal, and uplifting latency issues in crowded

regions, which may result in security breaches and

106

Nayak, P., Dash, D., Turjya, S. M. and Bandyopadhyay, A.

Resource Allotment Utilizing Multi-Armed Bandit Fostered Reinforcement Learning in Mobile Edge Computing Ecosystems.

DOI: 10.5220/0013335900004646

In Proceedings of the 1st International Conference on Cognitive & Cloud Computing (IC3Com 2024), pages 106-114

ISBN: 978-989-758-739-9

Figure 1: The MAB fostered RL approach in mobile edge computing ecosystems

lessening of reliability with autonomous driving ve-

hicles (Gong et al., 2024).

Following up on the IoT domain, where the em-

ployed devices produce data in variable quantities re-

quires special attention for processing and extract-

ing useful information. Productive resource allot-

ment is paramount for efﬁcient processing and op-

timal utilization of computational resources (Wang

et al., 2023a). Citing an example, in smart cities

where integrated sensors capture various data like air

pollution levels, trafﬁc movement/hotspots, and en-

ergy usage, therefore the employed resource allot-

ment procedures must be accustomed to the rapid

variance of data produced along with their dynamic

processing demands. To handle these dynamic issues,

there has been a growing interest in the enactment of

adjustable resource allotment procedures that would

function in dynamic MEC ecosystems. The proposed

mechanism must be able to consider the user require-

ment and real-time data feedback to allot computa-

tional resources. Thereby, this research paper fo-

cuses on introducing and enacting a Multi Arm Bandit

(MAB) (Simchi-Levi and Wang, 2023) fostered Rein-

forcement Learning (RL) techniques for allotting re-

sources in MEC ecosystems (Preil and Krapp, 2023)

(ﬁg.1). Here the resource allotment problem is formu-

lated as an MAB problem, and the proposed technique

enables diligent decision enactment for allotting re-

sources, thereby improving resource utilization, less-

ening latency, and enhancing the satisfaction of the

users.

2 RELATED WORKS

Vast research has been carried out in the domain of

Mobile Edge Computing ecosystems focusing on re-

source allotment, which has examined various tech-

niques to enhance system productivity/performance

while functioning on dynamic edge ecosystems (Chi

et al., 2023). Conventional allotment strategies like

Robin-Robin (Zhou et al., 2023), which is simpler,

but yet suffer when exposed to varying network and

resource needs, thereby resulting in a substantial sys-

tem performance drop. To tackle these inefﬁciencies,

resource allotment procedures that perform dynami-

cally have grown increasingly popular. RL strategies

like the MAB algorithms enable dynamic allotment of

resources founded upon the input, which is given in

real-time, thereby improving resource usage and en-

hancing system performance (Galli et al., 2023).

Researchers have worked on strategies like multi-

objective enhancement to handle various aspects like

throughput, latency/delay, and enable efﬁcient con-

sumption of energy (Dehghani and Movahhedinia,

2023). These strategies employ generic procedures,

evolutionary procedures, and Pareto enhancement

techniques to address the various sophisticated trade-

offs contained in MEC resource allotment situations

(Khan et al., 2024).

Resource Allotment Utilizing Multi-Armed Bandit Fostered Reinforcement Learning in Mobile Edge Computing Ecosystems

107

Utilization of autonomous procedures and coin-

ciding it with edge brilliance was stood up as a

noteworthy solution for resource allotment in edge

ecosystems. Approaches like edge-catching and fed-

erated learning provide the facility of dispersed de-

cision enactment for variable resource allotment at

the network edge. By utilizing edge intelligence and

automation, MEC systems can dynamically adjust to

varying conditions, improving resource allotment in

real time and enriching overall system productivity

and performance (Kar et al., 2023).

Likewise, real-world integrations and case anal-

yses provide valuable understandings of the practi-

cal feasibility and efﬁcacy of resource allotment algo-

rithms in myriad MEC ecosystems, comprising smart

cities, industrial IoT, and vehicular networks. These

analyses close the gap between theoretical improve-

ments and real-world applications, validating the ef-

fectiveness of resource allotment procedures and ad-

vising future research pathways in Mobile Edge Com-

puting.

3 THE PROPOSED SYSTEM

MODEL

Considering a Mobile Edge Computing (MEC)

ecosystem comprising of a group of N edge servers

represented by SR = {sr

, sr

, . . . , sr

}, serving a

set of M mobile users represented by UR =

{ur

, ur

, . . . , ur

}. The objective is to dynamically

allot computational elements to mobile users to re-

duce latency and increase system throughput while

guaranteeing efﬁcient resource usage.

We model the resource allotment scenario as a

Multi-Armed Bandit (MAB) problem, where every

edge server sr

serves as an arm and the available

computational elements at each server represent the

arms’ reward allotments. Let RW

(t) represent the re-

ward received from allotting resources at server sr

at time t, where RW

(t) is a random variable with an

unknown allotment.

At every time step t, the MEC regulator chooses

an edge server founded on a policy that balances in-

quiry (trying out various servers to learn their re-

ward allotments) and exploitation (allotting resources

to servers with potentially higher rewards founded

on current knowledge). Allowing AC(t) indicate the

step taken by the regulator at time t, where AC(t) ∈

{1, 2, ..., N} illustrates the chosen edge server index.

The goal is to maximize the cumulative reward

over T time steps:

max

∑

t=1

AC(t)

(t) (1)

in the above equation π denotes the utilized policy for

selecting actions.

In order to perform this, we utilize the Upper Con-

ﬁdence Bound (UCB) algorithm coming under RL

strategies, which functions by balancing inquiry and

exploitation. The algorithm utilizes the upper con-

ﬁdence limit expected rewards to select the actions,

enabling the identiﬁcation of novel/new arms while

utilizing the older high-reward arms.

4 THE PROPOSED MECHANISM

Now, we provide the algorithmic design for the pro-

posed Multi-Arm Bandit (MAB) fostered Reinforce-

ment Learning (RL) procedure for resource allotment

in MEC ecosystems dynamically. The proposed pro-

cedure strives to intelligently select the intended edge

servers, and allot them to the mobile edge users ef-

ﬁciently, which in turn will improve the latency, re-

source usage, and throughput, thereby enhancing the

user’s overall system performance.

4.1 Action Choosing

• The Upper Conﬁdence Bound (UCB) procedure is

used by the MEC regulator to choose the intended

edge servers at each time step t. The below-given

equation is utilized to calculate the action AC(t):

AC(t) = argmax

(t) +

2log(t)

(t)

(2)

where,

• Up and until time t, the estimated average reward

for server sr

is represented by

(t).

• Up and until time t, the number of times edge

server sr

was chosen is depicted by N

(t).

• The present time step is represented by t

4.2 The Reward Calculation

• At time t, utilizing the calculated average reward

(t) for server sr

the below given equation es-

timates/adjusts the sample mean.

(t) =

∑

τ=1

(τ) · I(AC(τ) = j)

(t)

(3)

where,

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

108

• RW

(τ) is the perceived reward received from

server sr

at time τ.

• I(AC(τ) = j) is an indicator function that becomes

1 if server sr

was chosen at time τ, and 0 other-

wise.

• N

(t) depicts the number of times server sr

has

been chosen up to time t.

4.3 The Proposed Algorithm

Algorithm 1 MAB-RL for Dynamic Resource Allot-

ment in MEC.

1: Initialize estimated rewards

(0) and selection

counts N

(0) for all servers sr

2: for time step t = 1, 2, . . . , T do

3: Select action AC(t) using UCB algorithm:

4: AC(t) = argmax



(t − 1) +

2log(t)

(t−1)



5: Execute action AC(t) and observe reward

AC(t)

(t).

6: Update estimated mean reward

AC(t)

(t):

AC(t)

(t) =

∑

τ=1

AC(t)

(τ)·I(AC(τ)=AC(t))

AC(t)

(t)

8: Increment selection count N

AC(t)

(t):

AC(t)

(t) = N

AC(t)

(t − 1) + 1

9: end for

The algorithm initializes the calculated rewards

and choosing counts for all servers. At each time

step, it chooses an action (edge server) founded on

the UCB algorithm. It accomplishes the selected ac-

tion, observes the reward, and updates the calculated

mean reward for the selected server. Finally, it in-

crements the choice count for the chosen server. The

process continues until the speciﬁed time horizon T

is attained. This algorithm enables dynamic resource

allotment in MEC ecosystems by adaptively choos-

ing edge servers founded on real-time feedback, thus

improving system performance.

4.4 Algorithm Properties

Proposition 1: The UCB algorithm guarantees that

the calculated mean reward converges to the true

mean reward with high probability.

Proof: Let

(t) denote the calculated mean re-

ward for server sr

at time t, and RW

(t) represent

the true mean reward. By Hoeffding’s inequality, we

have:

Prb



(t) − RW

(t) > ε



≤ e

−2ε

(t)

(4)

where, ε > 0 is a constant depicting the deviation

from the true mean reward, and N

(t) is the number

of times server sr

has been chosen up to time t.

To conﬁrm that

(t) is near to RW

(t) with high

probability, we set ε =

log(t)

N(t)

. Then, we have:

Prb

(t) − RW

(t) >

log(t)

(t)

≤

(5)

By the union bound, the probability that any server

varies from its true mean reward declines expo-

nentially with time. Therefore, the UCB algorithm

guarantees convergence to the true mean reward for

all servers equipped with high probability.

Proposition 2: Regret Bound for UCB Algorithm

The regret of the UCB algorithm is determined by

O(Glog(T )), where G is the servers present and T

is the time horizon.

Proof: Let RW

∗

represent the maximum mean re-

ward among all servers. The regret at time T is de-

scribed as:

Regret(T) = T · RW

∗

−

∑

j=1

∑

t=1

(t)

(6)

By the UCB algorithm’s investigation-

exploitation trade-off, the envisioned cumulative

reward of the algorithm satisﬁes:

∑

t=1

AC(t)

(t)

≥

∑

j=1

∑

t=1

(t)

−

∑

j=1

2log(T )

(T )

(7)

Using Proposition 1, we have:

∑

j=1

2log(T )

(T )

= O(

Glog(T )) (8)

Thus, the regret of the UCB algorithm is deter-

mined by O(Glog(T )), demonstrating sublinear re-

gret growth with relation to the time horizon T .

These theorems deliver theoretical assurances for

the convergence and productivity of the UCB algo-

rithm in dynamic resource allotment procedures.

4.5 Time Complexity Analysis

The time complexity study of the suggested Multi-

Armed Bandit (MAB) based Reinforcement Learn-

ing (RL) mechanism for dynamic resource allotment

in Mobile Edge Computing (MEC) ecosystems con-

cerns evaluating the computational cost of key pro-

cesses within the algorithm.

UCB algorithm based action choosing: The time

complexity of choosing an action utilizing the UCB

algorithm relies on computing the upper conﬁdence

bounds for each arm (edge server). For G servers and

Resource Allotment Utilizing Multi-Armed Bandit Fostered Reinforcement Learning in Mobile Edge Computing Ecosystems

109

T time steps, the time complexity for action choosing

is O(G).

Reward Calculation: Revising the calculated

mean reward for each server concerns calculating the

sample average of observed rewards. For every time

step t, the time complexity of revising the calculated

mean reward for G servers is O(G).

Overall Time Complexity: The overall time com-

plexity of the suggested strategy can be approximated

as O(G), where G is the number of servers (arms)

and T is the time horizon. This complexity results

from making action decisions and evaluating rewards

at each time step.

Regret Bound Analysis: The regret bound in-

volves calculating the cumulative regret across T time

steps. Regret increases sublinearly with T (limited by

O(Glog(T ))), resulting in a modest processing cost

relative to the major operations of action selection and

reward computation.

The suggested approach for dynamic resource al-

location in MEC ecosystems is highly impacted by

the number of edge servers (G) and the time pe-

riod (T ). The algorithm’s complexity scales linearly

with both G and T , making it computationally feasi-

ble for practical usages in real-time MEC procedures.

Furthermore, the regret-bound investigation demon-

strates the algorithm’s efﬁcacy in having sublinear re-

gret growth with respect to time, further backing its

scalability and usefulness.

4.6 Scenario Analysis

Considering a scenario where there are 3 edge servers

(arms) represented by sr

, sr

, and sr

, providing ser-

vice to a group of mobile users. The objective is to

dynamically allot computational elements to reduce

latency and increase throughput.

Step 1. Initialization:

• At ﬁrst, the algorithm initializes the estimated

mean rewards



(t) ) and choice counts ( N

(t)

) for each server sr

• Let’s assume the initial estimated mean rewards

are:

–

(0) = 0.5

–

(0) = 0.3

–

(0) = 0.2

• And the initial choice counts are all set to 0 :

(0) = N

(0) = 0.

Step 2. Action Choosing (Time Step 1-5):

• At each time step, the algorithm chooses an action

(server) utilizing the UCB algorithm.

• For instance, at time step 1, the UCB algorithm

chooses server sr

as it has the maximum upper

conﬁdence bound.

• Likewise, actions are chosen for time steps 2 to 5

founded on the UCB algorithm.

Step 3. Reward Observation and Computation:

• After choosing an action, the algorithm performs

it, notices the reward, and revises the calculated

mean reward for the chosen server.

• For example, at time step 1, if server sr

obtains a

reward of 0.6, the calculated mean reward

(t)

is updated consequently.

Step 4. Steps Repetition 2-3 (Time Step 6-10):

• The procedure of action selection, reward obser-

vance, and computation persists for subsequent

time steps.

Step 5. Performance Evaluation:

• After a certain number of time steps, the algo-

rithm’s productivity is assessed in concerning cu-

mulative reward, regret, and convergence to the

true mean reward.

By iteratively revising the estimated mean rewards

and choosing actions founded on the UCB algorithm,

the algorithm adjusts to the various ecosystem dy-

namics and enhances resource allotment to improve

system productivity. In real-world cases, the algo-

rithm would run continually, dynamically modifying

resource allotment founded on real-time feedback and

user needs.

5 SOCIAL WELFARE OF THE

PROPOSED MECHANISM

In the context of resource allotment in Mobile Edge

Computing (MEC) ecosystems, social welfare can be

described as a metric that captures the overall beneﬁt

emanated from the allotment of computational utili-

ties to mobile users. Social welfare regards not only

the individual utilities of users but also the system-

wide purposes such as lowering latency, increasing

throughput, and improving resource usage.

Social welfare (sW ) can be represented as the

summation of individual utilities (UT

) of all mobile

users (UT ):

SW =

∑

j=1

(9)

where:

• M is the total number of mobile users.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

110

• UT

denotes the utility acquired by user j.

The individual utility (UT

) can be determined

based on diverse factors such as latency, throughput,

quality of service (QoS), energy usage, or any other

suitable performance metric. For example, in the con-

text of MEC, the utility of a user may rely on the la-

tency experienced in completing a task, the quantity

of computational resources allotted, and the trustwor-

thiness of the service delivered.

Figure 2: Comparison between the proposed and round

robin strategy in handling latency (ms) between dynamic

workloads, bursty trafﬁc, and network congestion

Figure 3: Impact of the proposed and round robin strategy

in managing system throughput (Mbps) between dynamic

workloads, bursty trafﬁc, and network congestion

Figure 4: Impact of the proposed and round robin strategy

in managing system utilization (%) between dynamic work-

loads, bursty trafﬁc, and network congestion

Figure 5: Impact of the proposed and round robin strategy

in achieving social welfare between dynamic workloads,

bursty trafﬁc, and network congestion

6 THE SIMULATION RESULTS

6.1 Simulation setup

Our research activity traverses across three stem do-

mains—dynamic workloads, bursty trafﬁc, and net-

work congestion—to evaluate the efﬁcacy of our sug-

gested methodology in a simulation ecosystem. For

comparison purposes, we employed the round-robin

task scheduling algorithm, which is a static schedul-

ing algorithm that works by preempting the function-

ing of the executing processes at a speciﬁc time quan-

tum and resuming them again from the ready queue

to accomplish the tasks. Alongside this, parameters

such as latency (ms), throughput (Mbps), resource uti-

lization, and social welfare are estimated by allotting

ﬁve mobile devices among three edge servers. This

setup assures clarity while retaining applicability to

real-world designs.

6.2 Results

In Table 1, we demonstrate the allotment of 5 mo-

bile devices among 3 edge servers utilizing the pro-

posed methodology to estimate latency (ms), through-

put (Mbps), resource utilization (%), and social wel-

fare in terms of dynamic workloads. Similarly, in Ta-

ble 2 and Table 3, we execute the same investigation

for bursty trafﬁc and network congestion scenarios re-

spectively utilizing the proposed methodology. The

same approach is performed for the round-robin pro-

cedure, but for the sake of simplicity, it is not included

in this research paper, thereby only the mean values

are reported.

Utilizing the above tables in Tables 4 and 5, we

introduce the mean latency, mean throughput, mean

resource usage, and mean social welfare for manag-

Resource Allotment Utilizing Multi-Armed Bandit Fostered Reinforcement Learning in Mobile Edge Computing Ecosystems

111

Table 1: Dynamic Workload management via the proposed methodology for allotting edge devices to mobile users. ES:Edge

Server,LT:Latency (ms),TP:Throughput (Mbps), RU:Resource Utilization (%),SW:Social Welfare

Mobile Device ES LT TP RU SW

Device 1 Server 1 50 100 70 850

Device 2 Server 2 60 95 65 820

Device 3 Server 3 45 105 75 870

Device 4 Server 1 55 98 80 880

Device 5 Server 2 52 102 72 840

Table 2: Bursty trafﬁc management via the proposed methodology for allotting edge devices to mobile users. ES:Edge

Server,LT:Latency (ms),TP:Throughput (Mbps), RU:Resource Utilization (%),SW:Social Welfare

Mobile Device ES LT TP RU SW

Device 1 Server 1 60 80 60 700

Device 2 Server 2 70 75 55 680

Device 3 Server 3 55 85 65 720

Device 4 Server 1 65 78 70 730

Device 5 Server 2 62 82 62 710

ing dynamic workloads utilizing both the proposed

methodology and the round-robin procedure. Sim-

ilarly, in Tables 6 and 7, we supply these metrics

for managing bursty trafﬁc utilizing the suggested

methodology and round-robin procedure. Eventu-

ally, in Table 8 and 9, the same investigation is

performed for handling network congestion utilizing

both methodologies.

Visualizing the tabular data, ﬁg. 2 demonstrates

the effectiveness of the suggested methodology in

comparison to the round-robin technique in lower-

ing latency across three domains: handling dynamic

workloads, handling bursty trafﬁc, and network con-

gestion. Fig. 3 indicates the usefulness of the

suggested procedure in improving throughput cor-

responding to the round-robin technique across the

same domains. Similarly, Fig. 4 showcases the

progress in resource utilization accomplished by the

suggested methodology over round-robin across these

domains. Lastly, Fig. 5 portrays the enhancement in

social welfare, directing to better customer satisfac-

tion, attained by the presented methodology over the

round-robin procedure across the same domains.

7 CONCLUSION AND FUTURE

WORKS

This research paper presents a newly integrated re-

source allotment approach for Mobile Edge Com-

puting (MEC) utilizing Multi-Armed Bandit (MAB)

and Reinforcement Learning (RL). By presenting re-

source allotment as an MAB problem and employing

the Upper Conﬁdence Bound (UCB) algorithm, it al-

lows adaptive allotment of computational resources

in real-time, improving performance and minimiz-

ing latency. The investigation exhibits the signiﬁ-

cance of the procedure in dynamically determining

edge servers founded on real-time feedback, exceed-

ing static allotment strategies. Future research av-

enues comprise upgrading dynamic MEC models, in-

vestigating multi-objective improvements, and han-

dling security issues. Real-world integration and part-

nership with industry stakeholders are crucial for val-

idation and implementation. Overall, the research do-

nates to growing MEC systems for latency-prone util-

ities in the 5G era and beyond.

ACKNOWLEDGMENT

We wholeheartedly thank the Department of Com-

puter Science Engineering of Kalinga Institute of In-

dustrial Technology for giving us this opportunity

to work in this cutting-edge ﬁeld. Furthermore, we

thank our esteemed professors who constantly sup-

ported us throughout this research study by reviewing

our work with productive feedback and suggestions to

improve further.

REFERENCES

Ateya, A. A., Muthanna, A., Koucheryavy, A., Maleh, Y.,

and El-Latif, A. A. A. (2023). Energy efﬁcient of-

ﬂoading scheme for mec-based augmented reality sys-

tem. Cluster Computing, 26(1):789–806.

Chi, H. R., Silva, R., Santos, D., Quevedo, J., Corujo, D.,

Abboud, O., Radwan, A., Hecker, A., and Aguiar,

R. L. (2023). Multi-criteria dynamic service migration

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

112

Table 3: Network congestion management via the proposed methodology for allotting edge devices to mobile users. ES:Edge

Server,LT:Latency (ms),TP:Throughput (Mbps), RU:Resource Utilization (%),SW:Social Welfare

Mobile Device ES LT TP RU SW

Device 1 Server 1 70 60 50 600

Device 2 Server 2 80 55 45 580

Device 3 Server 3 65 65 55 620

Device 4 Server 1 75 58 60 630

Device 5 Server 2 72 62 52 610

Table 4: Mean values comprising all the edge servers and

the mobile devices in managing dynamic workloads utiliz-

ing the proposed methodology

Metric Mean Value

Mean Latency (ms) 52.4

Mean Throughput (Mbps) 100.0

Mean Resource Utilization (%) 72.4%

Mean Social Welfare 852

Table 5: Mean values comprising all the edge servers and

the mobile devices in managing dynamic workloads utiliz-

ing the round robin methodology

Metric Mean Value

Mean Latency (ms) 70.0

Mean Throughput (Mbps) 80.0

Mean Resource Utilization 55.0%

Mean Social Welfare 750

Table 6: Mean values comprising all the edge servers and

the mobile devices in managing bursty trafﬁc utilizing the

proposed methodology

Metric Mean Value

Mean Latency (ms) 62.4

Mean Throughput (Mbps) 80.0

Mean Resource Utilization (%) 62.4%

Mean Social Welfare 708

Table 7: Mean values comprising all the edge servers and

the mobile devices in managing bursty trafﬁc utilizing the

round-robin methodology

Metric Mean Value

Mean Latency (ms) 72.4

Mean Throughput (Mbps) 70.0

Mean Resource Utilization (%) 52.4%

Mean Social Welfare 628

Table 8: Mean values comprising all the edge servers and

the mobile devices in managing network congestion utiliz-

ing the proposed methodology

Metric Mean Value

Mean Latency (ms) 72.4

Mean Throughput (Mbps) 60.0

Mean Resource Utilization (%) 52.4%

Mean Social Welfare 608

Table 9: Mean values comprising all the edge servers and

the mobile devices in managing bursty trafﬁc utilizing the

round-robin methodology

Metric Mean Value

Mean Latency (ms) 80.3

Mean Throughput (Mbps) 50.0

Mean Resource Utilization (%) 46.4%

Mean Social Welfare 528

for ultra-large-scale edge computing networks. IEEE

Transactions on Industrial Informatics.

Dehghani, F. and Movahhedinia, N. (2023). On the energy-

delay trade-off in ccn caching strategy: a multi-

objective optimization problem. Wireless Networks,

pages 1–15.

Galli, A., Moscato, V., Romano, S. P., and Sperl

ı, G.

(2023). Playing with a multi armed bandit to optimize

resource allocation in satellite-enabled 5g networks.

IEEE Transactions on Network and Service Manage-

ment.

Gong, Y., Huang, J., Liu, B., Xu, J., Wu, B., and Zhang, Y.

(2024). Dynamic resource allocation for virtual ma-

chine migration optimization using machine learning.

arXiv preprint arXiv:2403.13619.

Kar, B., Yahya, W., Lin, Y.-D., and Ali, A. (2023). Of-

ﬂoading using traditional optimization and machine

learning in federated cloud–edge–fog systems: A sur-

vey. IEEE Communications Surveys & Tutorials,

25(2):1199–1226.

Khan, S., Jiangbin, Z., Irfan, M., Ullah, F., and Khan, S.

(2024). An expert system for hybrid edge to cloud

computational ofﬂoading in heterogeneous mec-mcc

environments. Journal of Network and Computer Ap-

plications, page 103867.

Preil, D. and Krapp, M. (2023). Genetic multi-armed

bandits: a reinforcement learning approach for dis-

crete optimization via simulation. arXiv preprint

arXiv:2302.07695.

Ray, K. and Banerjee, A. (2024). Autonomous automotives

on the edge. In 2024 37th International Conference

on VLSI Design and 2024 23rd International Confer-

ence on Embedded Systems (VLSID), pages 264–269.

IEEE.

Simchi-Levi, D. and Wang, C. (2023). Multi-armed

bandit experimental design: Online decision-making

and adaptive inference. In International Conference

on Artiﬁcial Intelligence and Statistics, pages 3086–

3097. PMLR.

Resource Allotment Utilizing Multi-Armed Bandit Fostered Reinforcement Learning in Mobile Edge Computing Ecosystems

113

Wang, D., Bai, Y., Huang, G., Song, B., and Yu, F. R.

(2023a). Cache-aided mec for iot: Resource alloca-

tion using deep graph reinforcement learning. IEEE

Internet of Things Journal.

Wang, X., Li, J., Ning, Z., Song, Q., Guo, L., Guo, S.,

and Obaidat, M. S. (2023b). Wireless powered mobile

edge computing networks: A survey. ACM Computing

Surveys, 55(13s):1–37.

Zhou, J., Lilhore, U. K., Hai, T., Simaiya, S., Jawawi,

D. N. A., Alsekait, D., Ahuja, S., Biamba, C., and

Hamdi, M. (2023). Comparative analysis of meta-

heuristic load balancing algorithms for efﬁcient load

balancing in cloud computing. Journal of cloud com-

puting, 12(1):85.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

114