Reinforcement Learning Considering Worst Case and Equality within Episodes

Toshihiro Matsui

doi:10.5220/0009178603350342

Reinforcement Learning Considering Worst Case and Equality within Episodes

Toshihiro Matsui

2020

Abstract

Reinforcement learning has been studied as an unsupervised learning framework. The goal of standard reinforcement learning methods is to minimize the total cost or reward for the optimal policy. In several practical situations, equalization of the cost or reward values within an episode may be required. This class of problems can be considered multi-objective, where each part of an episode has individual costs or rewards that should be separately considered. In a previous study this concept was applied to search algorithms for shortest path problems. We investigate how a similar criterion considering the worst-case and equality of the objectives can be applied to the Q-learning method. Our experimental results demonstrate the effect and influence of the optimization with the criterion.

Download

Paper Citation

in Harvard Style

Matsui T. (2020). Reinforcement Learning Considering Worst Case and Equality within Episodes. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-395-7, pages 335-342. DOI: 10.5220/0009178603350342

in Bibtex Style

@conference{icaart20,
author={Toshihiro Matsui},
title={Reinforcement Learning Considering Worst Case and Equality within Episodes},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2020},
pages={335-342},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009178603350342},
isbn={978-989-758-395-7},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Reinforcement Learning Considering Worst Case and Equality within Episodes
SN - 978-989-758-395-7
AU - Matsui T.
PY - 2020
SP - 335
EP - 342
DO - 10.5220/0009178603350342