Discounted Markov Decision Processes with Fuzzy Rewards Induced by
Non-fuzzy Systems
Karla Carrero-Vera
1
, Hugo Cruz-Su
´
arez
1
and Ra
´
ul Montes-de-Oca
2
1
Benem
´
erita Universidad Aut
´
onoma de Puebla, Av. San Claudio y R
´
ıo Verde,
Col. San Manuel, CU, Puebla, Pue. 72570, Mexico
2
Departamento de Matem
´
aticas, Universidad Aut
´
onoma Metropolitana-Iztapalapa, Av. San Rafael Atlixco 186,
Keywords:
Markov Decision Processes, Dynamic Programming, Optimal Policy, Fuzzy Sets, Triangular Fuzzy Numbers.
Abstract:
This paper concerns discounted Markov decision processes with a fuzzy reward function triangular in shape.
Starting with a usual and non-fuzzy Markov control model (Hern
´
andez-Lerma, 1989) with compact action
sets and reward R, a control model is induced only substituting R in the usual model for a suitable triangular
fuzzy function
˜
R which models, in a fuzzy sense, the fact that the reward R is “approximately” received. This
way, for this induced model a discounted optimal control problem is considered, taking into account both a
ﬁnite and an inﬁnite horizons, and fuzzy objective functions. In order to obtain the optimal solution, the partial
order on the α-cuts of fuzzy numbers is used, and the optimal solution for fuzzy Markov decision processes
is found from the optimal solution of the corresponding usual Markov decision processes. In the end of the
paper, several examples are given to illustrate the theory developed: a model of inventory system, and two
others more in an economic and ﬁnancial context.
1 INTRODUCTION
In various applied areas, such as engineering, opera-
tions research, economics, ﬁnance, and artiﬁcial in-
telligence, among others, the data required to propose
a mathematical model present ambiguity, vagueness
or approximate characteristics of the problem of in-
terest (see, for instance, (Fakoor et al., 2016), (Efendi
et al., 2018)). Under this context, it is possible to ﬁnd
in the literature the approach of fuzzy numbers to in-
corporate this kind of characteristics or assertions to
mathematical models. The basic theory on the sub-
ject of fuzzy numbers was proposed by L. Zadeh in
his seminal article written in 1965, which is entitled:
“Fuzzy Sets” (Zadeh, 1965). Subsequently, various
research articles and texts referring to the fuzzy the-
ory can be found in the literature on the subject, more-
over, it is possible to locate extensions of the theory in
other ﬁelds of mathematical sciences, such as control
theory, see (Driankov et al., 2013).
In this manuscript, the authors provide a Markov
decision process (MDP, in plural MDPs) with a ﬁnite
state space, compact action sets and fuzzy character-
istics in its payoff or reward function. The idea is the
following: a crisp Markov control model (MCM) is
considered, that is, an MCM of the type that has been
analyzed in (Hern
´
andez-Lerma, 1989), with reward R
as a basis, and a new MCM is induced changing only
R for a reward function with fuzzy values. Speciﬁ-
cally, the authors assume that the fuzzy reward func-
tion is triangular. This way, the fuzzy control prob-
lem consists of determining a control policy that max-
imizes the expected total discounted fuzzy reward,
where the maximization is made with respect to the
partial order on the α-cuts of fuzzy numbers.
It is important to mention that triangular fuzzy
numbers have been extensively studied and applied in
fuzzy control (Pedrycz, 1994). Furthermore, the tri-
angular fuzzy numbers could be used to approximate
an arbitrary fuzzy number (see (Ban, 2009) and (Zeng
and Li, 2007)).
The methodology that is followed in this article
to guarantee the existence of optimal policies in the
fuzzy problem consists in applying the existence of
optimal policies and the validity of dynamic program-
ming for the crisp control problem, as well as certain
properties of the fuzzy triangular numbers.
To illustrate the theory developed several exam-
ples are given: a model of inventory system, and two
more in an economic and ﬁnancial context.
In a short summary, the main contribution of the
Carrero-Vera, K., Cruz-Suárez, H. and Montes-de-Oca, R.
Discounted Markov Decision Processes with Fuzzy Rewards Induced by Non-fuzzy Systems.
DOI: 10.5220/0010231400490059
In Proceedings of the 10th International Conference on Operations Research and Enterprise Systems (ICORES 2021), pages 49-59
ISBN: 978-989-758-485-5
Copyright
c
2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
49