Analysis of Upper Confidence Boundary Algorithms for the Multi- Armed Bandit Problem
Yitong Song
2024
Abstract
The Multi-Armed Bandit (MAB) problem encapsulates the critical exploration and exploitation dilemma inherent in sequential decision-making processes under uncertainty. Central to this problem is the balance between gaining new knowledge (exploration) and leveraging existing knowledge to maximize immediate performance (exploitation). This paper delves into the MAB problem's core, where the Upper Confidence Bound (UCB) strategy emerges as a robust solution that does not necessitate an advanced knowledge of sub-suboptimality gaps. The methodological contribution is the systematic characterization and comparison of various UCB variants, including the classic UCB, Asymptotically Optimal UCB, KL-UCB, and MOSS. Each variant assigns a UCB index to arms in a bandit setup, by selecting the arm that has the largest index-value in every round, aiming to balance the exploration/exploitation trade-off dynamically. Notably, these algorithms are designed to operate without the abrupt transition from exploration to exploitation, fostering a more seamless and adaptive decision-making process. The paper's conclusion underscores the efficacy of UCB algorithms in optimizing long-term rewards in uncertain environments, highlighting their practical relevance in fields where machine learning algorithms must operate with minimal prior knowledge.
DownloadPaper Citation
in Harvard Style
Song Y. (2024). Analysis of Upper Confidence Boundary Algorithms for the Multi- Armed Bandit Problem. In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI; ISBN 978-989-758-713-9, SciTePress, pages 463-468. DOI: 10.5220/0012953100004508
in Bibtex Style
@conference{emiti24,
author={Yitong Song},
title={Analysis of Upper Confidence Boundary Algorithms for the Multi- Armed Bandit Problem},
booktitle={Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI},
year={2024},
pages={463-468},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012953100004508},
isbn={978-989-758-713-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI
TI - Analysis of Upper Confidence Boundary Algorithms for the Multi- Armed Bandit Problem
SN - 978-989-758-713-9
AU - Song Y.
PY - 2024
SP - 463
EP - 468
DO - 10.5220/0012953100004508
PB - SciTePress