Learning Optimal Behavior in Environments with Non-stationary Observations

Ilio Boone; Gavin Rens

doi:10.5220/0010898200003116

Learning Optimal Behavior in Environments with Non-stationary Observations

Ilio Boone, Gavin Rens

2022

Abstract

In sequential decision-theoretic systems, the dynamics might be Markovian (behavior in the next step is independent of the past, given the present), or non-Markovian (behavior in the next step depends on the past). One approach to represent non-Markovian behaviour has been to employ deterministic finite automata (DFA) with inputs and outputs (e.g. Mealy machines). Moreover, some researchers have proposed frameworks for learning DFA-based models. There are at least two reasons for a system to be non-Markovian: (i) rewards are gained from temporally-dependent tasks, (ii) observations are non-stationary. Rens et al. (2021) tackle learning the applicable DFA for the first case with their ARM algorithm. ARM cannot deal with the second case. Toro Icarte et al. (2019) tackle the problem for the second case with their LRM algorithm. In this paper, we extend ARM to deal with the second case too. The advantage of ARM for learning and acting in non-Markovian systems is that it is based on well-understood formal methods with many available tools.

Download

Paper Citation

in Harvard Style

Boone I. and Rens G. (2022). Learning Optimal Behavior in Environments with Non-stationary Observations. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-547-0, pages 729-736. DOI: 10.5220/0010898200003116

in Bibtex Style

@conference{icaart22,
author={Ilio Boone and Gavin Rens},
title={Learning Optimal Behavior in Environments with Non-stationary Observations},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2022},
pages={729-736},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010898200003116},
isbn={978-989-758-547-0},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Learning Optimal Behavior in Environments with Non-stationary Observations
SN - 978-989-758-547-0
AU - Boone I.
AU - Rens G.
PY - 2022
SP - 729
EP - 736
DO - 10.5220/0010898200003116