framework into POMDP, then solve the problems of
the system framework through MARL algorithm.
In the experiment part, to verify the validity of Ex-
tensible MARL Framework, the paper tests three
MARL algorithms: IQL, VDN and QMIX, and it
evaluates these algorithms in two scenarios featuring
different obstacle rates: 20% for sparse and 40% for
dense. The experiment results show that IQL and
VDN adapt to the environment faster than QMIX, and
the reward value of IQL and VDN is slightly higher
than QMIX when the proportion of obstacles is 20%.
When the proportion of obstacles increases to 40%,
the performance of the VDN algorithm is slightly
higher than the other two algorithms.
In future research, we will further optimize the al-
gorithm for the poor performance of QMIX's hybrid
network. At the same time, we will design more dy-
namic complex scenarios to test the robustness of var-
ious algorithms. At the same time, we consider de-
signing more realistic experimental scenarios to en-
sure that the MARL algorithm can be truly integrated
with practical applications.
REFERENCES
Aggarwal, S., & Kumar, N. (2020). Path planning tech-
niques for unmanned aerial vehicles: A review, solu-
tions, and challenges. Computer communications, 149,
270-299.
Cao, X., Li, M., Tao, Y., & Lu, P. (2024). HMA-SAR:
Multi-Agent Search and Rescue for Unknown Located
Dynamic Targets in Completely Unknown Environ-
ments. IEEE Robotics and Automation Letters.
Cetinsaya, B., Reiners, D., & Cruz-Neira, C. (2024). From
PID to swarms: A decade of advancements in drone
control and path planning-A systematic review (2013β
2023). Swarm and Evolutionary Computation, 89,
101626.
Fang, W., Liao, Z., & Bai, Y. (2024). Improved ACO algo-
rithm fused with improved Q-Learning algorithm for
Bessel curve global path planning of search and rescue
robots. Robotics and Autonomous Systems, 182,
104822.
Fei, W. A. N. G., Xiaoping, Z. H. U., Zhou, Z. H. O. U., &
Yang, T. A. N. G. (2024). Deep-reinforcement-learn-
ing-based UAV autonomous navigation and collision
avoidance in unknown environments. Chinese Journal
of Aeronautics, 37(3), 237-257.
Hildmann, H., & Kovacs, E. (2019). Using unmanned aerial
vehicles (UAVs) as mobile sensing platforms (MSPs)
for disaster response, civil security and public
safety. Drones, 3(3), 59.
Hou, Y., Zhao, J., Zhang, R., Cheng, X., & Yang, L. (2023).
UAV swarm cooperative target search: A multi-agent
reinforcement learning approach. IEEE Transactions on
Intelligent Vehicles.
Kostrikov, I., Nair, A., & Levine, S. (2021). Offline rein-
forcement learning with implicit q-learning. arXiv pre-
print arXiv:2110.06169.
Lyu, M., Zhao, Y., Huang, C., & Huang, H. (2023). Un-
manned aerial vehicles for search and rescue: A survey.
Remote Sensing, 15(13), 3266.
Martinez-Alpiste, I., Golcarenarenji, G., Wang, Q., & Al-
caraz-Calero, J. M. (2021). Search and rescue operation
using UAVs: A case study. Expert Systems with Appli-
cations, 178, 114937.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., ... & Hassabis, D. (2015).
Human-level control through deep reinforcement learn-
ing. nature, 518(7540), 529-533.
Moon, J., Papaioannou, S., Laoudias, C., Kolios, P., & Kim,
S. (2021). Deep reinforcement learning multi-UAV tra-
jectory control for target tracking. IEEE Internet of
Things Journal, 8(20), 15441-15455.
Ni, J., Tang, G., Mo, Z., Cao, W., & Yang, S. X. (2020). An
improved potential game theory based method for
multi-UAV cooperative search. IEEE Access, 8,
47787-47796.
Raap, M., PreuΓ, M., & Meyer-Nieberg, S. (2019). Moving
target search optimizationβa literature review. Comput-
ers & Operations Research, 105, 132-140.
Rashid, T., Samvelyan, M., De Witt, C. S., Farquhar, G.,
Foerster, J., & Whiteson, S. (2020). Monotonic value
function factorisation for deep multi-agent reinforce-
ment learning. Journal of Machine Learning Re-
search, 21(178), 1-51.
Samvelyan, M., Rashid, T., De Witt, C. S., Farquhar, G.,
Nardelli, N., Rudner, T. G., ... & Whiteson, S. (2019).
The starcraft multi-agent challenge. arXiv preprint
arXiv:1902.04043.
Shixin, Z., Feng, P., Anni, J., Hao, Z., & Qiuqi, G. (2024).
The unmanned vehicle on-ramp merging model based
on AM-MAPPO algorithm. Scientific Reports, 14(1),
19416.
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M.,
Zambaldi, V., Jaderberg, M., ... & Graepel, T. (2017).
Value-decomposition networks for cooperative multi-
agent learning. arXiv preprint arXiv:1706.05296.
Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Kor-
jus, K., Aru, J., ... & Vicente, R. (2017). Multiagent co-
operation and competition with deep reinforcement
learning. PloS one, 12(4), e0172395.
Vaswani, A. (2017). Attention is all you need. Advances in
Neural Information Processing Systems.
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M.,
& Freitas, N. (2016, June). Dueling network architec-
tures for deep reinforcement learning. In International
conference on machine learning (pp. 1995-2003).
PMLR.
Zhang, J., Li, M., Xu, Y., He, H., Li, Q., & Wang, T. (2024).
StrucGCN: Structural enhanced graph convolutional
networks for graph embedding. Information Fusion,
102893.
Zhang, M., Han, Y., Chen, S., Liu, M., He, Z., & Pan, N.
(2023). A multi-strategy improved differential evolu-