# Reinforcement Learning Considering Worst Case and Equality within Episodes

### Toshihiro Matsui

#### Abstract

Reinforcement learning has been studied as an unsupervised learning framework. The goal of standard reinforcement learning methods is to minimize the total cost or reward for the optimal policy. In several practical situations, equalization of the cost or reward values within an episode may be required. This class of problems can be considered multi-objective, where each part of an episode has individual costs or rewards that should be separately considered. In a previous study this concept was applied to search algorithms for shortest path problems. We investigate how a similar criterion considering the worst-case and equality of the objectives can be applied to the Q-learning method. Our experimental results demonstrate the effect and influence of the optimization with the criterion.

Download#### Paper Citation

#### in Harvard Style

Matsui T. (2020). **Reinforcement Learning Considering Worst Case and Equality within Episodes**.In *Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,* ISBN 978-989-758-395-7, pages 335-342. DOI: 10.5220/0009178603350342

#### in Bibtex Style

@conference{icaart20,

author={Toshihiro Matsui},

title={Reinforcement Learning Considering Worst Case and Equality within Episodes},

booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

year={2020},

pages={335-342},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0009178603350342},

isbn={978-989-758-395-7},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,

TI - Reinforcement Learning Considering Worst Case and Equality within Episodes

SN - 978-989-758-395-7

AU - Matsui T.

PY - 2020

SP - 335

EP - 342

DO - 10.5220/0009178603350342