Risk-sensitive Markov Decision Processes with Risk Constraints of
Coherent Risk Measures in Fuzzy and Stochastic Environment
Yuji Yoshida
Faculty of Economics and Business Administration, The University of Kitakyushu,
4-2-1 Kitagata, Kokuraminami, Kitakyushu 802-8577, Japan
Keywords:
Rrisk-sensitive Reward, Risk Constraint, Coherent Risk Measure, Weighted Average Value-at-Risk, Risk
Averse Utility, Fuzzy Random Variable, Perception-based Extension.
Abstract:
Risk-sensitive decision making with constraints of coherent risk measures is discussed in Markov decision
processes. Risk-sensitive expected rewards under utility functions are approximated by weighted average
value-at-risks, and risk constraints are described by coherent risk measures. In this paper, coherent risk mea-
sures are represented as weighted average value-at-risks with the best risk spectrum derived from decision
maker’s risk averse utility, and the risk spectrum can inherit the risk averse property of the decision maker’s
utility as weighting. By perception-based extension for fuzzy random variables, a dynamic portfolio model
with coherent risk measures is introduced. To find feasible regions, firstly a dynamic risk-minimizing prob-
lem is discussed by mathematical programming. Next a risk-sensitive reward maximization problem under
the feasible coherent risk constraints is demonstrated. A few numerical examples are given to understand the
obtained results.
1 INTRODUCTION
Risk-sensitive decision making is one of most impor-
tant themes in management sciences and so on. Risk-
sensitive expected rewards and risk measures are rea-
sonable and effective tools in risk-sensitive decision
making. Risk-sensitive expectation, which was intro-
duced by (Howard and Matheson, 1972), is given by
f
1
(E( f (·))), (1)
where f and f
1
are decision maker’s utility func-
tion and its inverse function and E(·) is an expecta-
tion. Risk-sensitive expectation is a method to es-
timate random risks through utility functions, and it
is studied by several authors. (B
¨
auerle and Rieder,
2014). However this criterion with non-linear utility
functions f has computational complexity in general.
For example, let {X
t
} is a sequence of random vari-
ables. Then
t
E( f (X
t
)) implies a sum of decision
maker’s expected utility values and it is non-sense.
While f
1
(E( f (X
t
))) belongs to a space of values
where random variables X
t
take, and their sum with
respect to t has meaning. Therefore in dynamic opti-
mization problems we need to compute a sum of val-
ues with criterion (1) with the inverse function f
1
by Bellman equations. When f are non-linear utility
functions, it is difficult to compute the optimal values
immediately (B
¨
auerle and Rieder, 2014).
In decision making, several risk measures have
been proposed for economic theory, financial analy-
sis, asset management and engineering. The variance
was classically used as a risk measure in decision
processes, and the risk measure has been improved
from both practical and theoretical aspects. Nowa-
days drastic declines of asset prices are studied, and
value-at-risk (VaR) is used widely to estimate the risk
of asset price decline in practical management (Jo-
rion, 2006). VaR is defined by percentiles at a speci-
fied probability, however it does not have coherency.
Coherent risk measures have been studied to improve
the criterion of risks with worst scenarios (Artzner et
al., 1999). Several improved risk measures based on
VaR are proposed: for example, conditional value-at-
risks, expected shortfall, entropic value-at-risk (Rock-
afellar and Uryasev, 2000), (Tasche, 2002). Recently
(Kusuoka, 2001) gave a spectral representation for co-
herent risk measures, and (Acerbi, 2002) and (Adam
et al., 2008) discussed its applications to portfolio se-
lection and so on. Further (Yoshida, 2018) has intro-
duced a spectral weighted average value-at-risk as the
best coherent risk measure derived from utility func-
tions. Using this derived coherent risk measure, the
risk measure can inherit the risk averse property of
the decision maker’s utility function as risk spectrum
Yoshida, Y.
Risk-sensitive Markov Decision Processes with Risk Constraints of Coherent Risk Measures in Fuzzy and Stochastic Environment.
DOI: 10.5220/0007957502690277
In Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI 2019), pages 269-277
ISBN: 978-989-758-384-1
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
269
weighting. This paper adopts the spectral weighted
average value-at-risks to estimate risk-sensitive re-
wards under constraints, which is also a kind of risk-
sensitive extended model of (Yoshida, 2017).
Fuzzy random variables, which were introduced
by (Kwakernaak, 1978), are applied to decision mak-
ing under uncertainty with fuzziness such as linguistic
data in engineering, economics et al.. To represent un-
certainty, we use fuzzy random variables which have
two kinds of uncertainties, i.e. randomness and fuzzi-
ness. In this paper, randomness is used to repre-
sent the uncertainty regarding the belief degree of fre-
quency, and fuzziness is applied to linguistic impre-
cision of data because of a lack of information about
the current stock market. In this paper, using fuzzy
random variables, we deal with optimization of port-
folio allocation in an environment with both random-
ness and fuzziness. We extend coherent risk measures
and a risk-sensitive estimation for real-valued random
variables to one regarding fuzzy random variables
from the viewpoint of perception-based method in
(Yoshida, 2007), and we apply the perception-based
criteria to estimate the uncertainties. (Yoshida, 2006)
introduced the mean, the variance and the covariances
of fuzzy random variables, using evaluation weights
and θ-mean functions. This paper estimates fuzzy
numbers and fuzzy random variables by probabilis-
tic expectation and these criteria, which are charac-
terized by possibility and necessity criteria for sub-
jective estimation and pessimistic-optimistic indexes
for subjective decision.
In Section 2, we introduce coherent risk mea-
sures and their spectral representation for coherent
risk measures based on (Kusuoka, 2001), and a co-
herent risk measure is given with the best risk spec-
trum derived from decision maker’s utility. In Section
3, we introduce coherent risk measures and a risk-
sensitive estimation for fuzzy random variables by
perception-based extension, and we give estimation
tools with evaluation weights and θ-mean functions
in order to evaluate the randomness and fuzziness for
fuzzy random variables. In Section 4, we discuss a
risk-sensitive decision problem under risk constraints
by use of coherent risk measures. Then risk-sensitive
rewards are approximated by weighted average value-
at-risks with the risk spectrum derived from the util-
ity, and the risk constraints are described by coherent
risk measures which are represented by weighted av-
erage value-at-risks. In Section 5 we investigate the
lower bound of risk values to find feasible regions of
the constraints. In Section 6 we discuss maximiza-
tion of risk-sensitive rewards under risk conditions.
In Section 7, we give a few numerical examples to
understand the obtained results.
2 COHERENT RISK MEASURES
DERIVED FROM RISK AVERSE
UTILITY
Let R = (,) and let P be a non-atomic probabil-
ity on a sample space . Let X be the family of all
integrable real-valued random variables X on with
a continuous distribution x 7→ F
X
(x) = P(X < x) for
which there exists a non-empty open interval I such
that F
X
(·) : I (0,1) is strictly increasing and onto.
Then there exists a strictly increasing and continuous
inverse function F
1
X
: (0,1) I. For a probability
p (0,1), value-at-risk (VaR) is given by the per-
centile of the distribution F
X
, i.e.
VaR
p
(X) = F
1
X
(p). (2)
Then average value-at-risk (AVaR) at a probability
p (0, 1] is given by
AVaR
p
(X) =
1
p
Z
p
0
VaR
q
(X) dq. (3)
The following fundamental concepts are well-known
(Artzner et al., 1999, Kusuoka, 2001).
Definition 1. Let a map ρ : X 7→ R.
(i) Random variables X( X ) and Y( X ) are called
comonotonic if (X(ω)X(ω
0
))(Y (ω) Y (ω
0
))
0 holds for almost all ω,ω
0
.
(ii) ρ is called comonotonically additive if ρ(X +Y ) =
ρ(X) + ρ(Y ) holds for all comonotonic X,Y X .
(iii) ρ is called law invariant if ρ(X) = ρ(Y ) holds for
all X,Y X satisfying P(X < ·) = P(Y < ·).
(iv) ρ is called continuous if lim
n
ρ(X
n
) = ρ(X)
holds for {X
n
} X and X X such that
lim
n
X
n
= X almost surely.
Hence the following definition characterizes co-
herent risk measures (Artzner at al., 1999).
Definition 2. A map ρ : X 7→ R is called a coherent
risk measure if it satisfies the following (i) – (iv):
(i) ρ(X) ρ(Y ) for X,Y X satisfying X Y .
(monotonicity)
(ii) ρ(cX) = cρ(X) for X X and c R satisfying
c 0. (positive homogeneity)
(iii) ρ(X +c) = ρ(X) c for X X and c R. (trans-
lation invariance)
(iv) ρ(X + Y ) ρ(X) + ρ(Y ) for X,Y X . (sub-
additivity)
It is known in (Artzner et al., 1999) that
AVaR
p
(·) is a coherent risk measure however
FCTA 2019 - 11th International Conference on Fuzzy Computation Theory and Applications
270
VaR
p
(·) is not coherent because sub-additivity (iv)
does not hold, where means the minus singature.
Conditional value-at-risks and expected shortfall are
also famous coherent risk measures (Rockafellar and
Uryasev, 2000, Tasche, 2002). Now, for a probability
p (0,1] and a non-increasing right-continuous func-
tion λ on [0, 1] satisfying
R
1
0
λ(q)dq = 1, we define a
weighted average value-at-risk with weighting λ on
(0, p) by
AVaR
λ
p
(X) =
Z
p
0
VaR
q
(X) λ(q) dq
Z
p
0
λ(q)dq.
(4)
Then λ is called a risk spectrum, and AVaR
λ
p
be-
comes a coherent risk measure. Further recently
(Kusuoka, 2001) proved coherent risk measures are
represented by weighted average value-at-risks in the
following spectral representation (Yoshida, 2018).
Lemma 1. Let ρ : X 7→R be a law invariant, comono-
tonically additive, continuous coherent risk measure.
Then there exists a risk spectrum λ such that
ρ(X) = AVaR
λ
1
(X) (5)
for X X . Further, AVaR
λ
p
is a coherent risk mea-
sure on X for p (0,1).
In this paper we use a law invariant, comonoton-
ically additive, continuous coherent risk measure ρ,
and we also deal with a case when value-at-risks are
represented as
VaR
p
(X) = E(X ) + κ(p) ·σ(X ) (6)
with the mean E(X) and the standard deviation σ(X )
of random variables X X , where κ : (0,1) 7→R is an
increasing function. From (4) and (6) we have
AVaR
λ
p
(X) = E(X) + κ
λ
(p) ·σ(X), (7)
where
κ
λ
(p) =
Z
p
0
κ(q)λ(q)dq
Z
p
0
λ(q)dq. (8)
Let f : I 7→ R be a C
2
-class risk averse utility function
satisfying f
0
> 0 and f
00
0 on I, where I is an open
interval. For a probability p (0,1] and a random
variable X X , a non-linear risk-sensitive form
f
1
1
p
Z
p
0
f (VaR
q
(X)) dq
(9)
is an average value-at-risk of X on the downside (0, p)
under utility f . We note that (9) is reduced to (3) if
f is risk-neutral, i.e. it is a linear increasing function.
Hence we have the following lemma from (Yoshida,
2018).
Lemma 2. A risk spectrum λ which minimizes the
distance between (9) and (4):
XX
f
1
1
p
Z
p
0
f (VaR
q
(X)) dq
AVaR
λ
p
(X)
2
(10)
for p (0, 1] is given by
λ(p) = e
R
1
p
C(q) dq
C(p) (11)
with a component function C in (Yoshida, 2018) if λ
is non-increasing.
For exponential utility function f , the correspond-
ing component function C is given concretely in Ex-
ample 2. The component functions C for several
utilities f are also investigated in (Yoshida, 2018).
In Lemma 2 the coherent risk measure AVaR
λ
p
has
a kind of semi-linear property such as Definition
2(ii)(iii) and it brings us effective computation, and
the risk spectrum λ can also inherit the risk averse
property of the non-linear utility function f as weight-
ing on (0, p). Regarding risk-sensitive rewards (1),
in the sequel we use the risk spectrum λ in Lemma
2 because AVaR
λ
p
is the best coherent risk measure
derived from risk averse utility f .
3 FUZZINESS AND EXTENDED
CRITERIA
A fuzzy number is represented by its membership
function ˜n : R [0,1] which is normal, upper-
semicontinuous, fuzzy convex and has a compact sup-
port (Zadeh, 1965). Let N be the set of all fuzzy num-
bers. For a fuzzy number ˜n N , its α-cuts are given
by closed intervals ˜n
α
= {x R | ˜n(x) α}= [˜n
α
, ˜n
+
α
]
for α (0,1]. An addition and a scalar multiplica-
tion for fuzzy numbers are defined by their α-cuts.
For fuzzy numbers ˜n, ˜m N , fuzzy max order ˜n ˜m
means that ˜n
±
α
˜m
±
α
for all α (0,1]. A fuzzy-
number-valued map
˜
X : N is called a fuzzy ran-
dom variable if
˜
X
±
α
X for all α (0,1], where
˜
X
α
(ω) = {x R |
˜
X(ω)(x) α} = [
˜
X
α
(ω),
˜
X
+
α
(ω)]
for ω . Let
˜
X be the family of all fuzzy random
variables on . (Kruse and Meyer, 1987) gave the ex-
pectation of fuzzy random variables
˜
X
˜
X in the fol-
lowing perception-based definition based on Zadeh’s
extension principle:
˜
E(
˜
X)(x) = sup
XX :E(X )=x
inf
ω
˜
X(ω)(X (ω)) (12)
for x R, where E(·) is the expectation for real-
valued random variables. Then, the expectation
Risk-sensitive Markov Decision Processes with Risk Constraints of Coherent Risk Measures in Fuzzy and Stochastic Environment
271
˜
E(
˜
X) is a fuzzy number with α-cut
˜
E(
˜
X)
α
=
[E(
˜
X
α
),E(
˜
X
+
α
)]. Define criterion (1) by
ϕ(X) = f
1
(E( f (X))) (13)
for X X . For a weighted average value-at-risk
AVaR
λ
p
, the criterion ϕ and a coherent risk measure
ρ, their extensions for a fuzzy random variable
˜
X
˜
X
are also fuzzy numbers:
^
AVaR
λ
p
(
˜
X)(x) = sup
XX :AVaR
λ
p
(X)=x
inf
ω
˜
X(ω)(X (ω)),
(14)
˜
ϕ(
˜
X)(x) = sup
XX :ϕ(X)=x
inf
ω
˜
X(ω)(X (ω)), (15)
˜
ρ(
˜
X)(x) = sup
XX :ρ(X)=x
inf
ω
˜
X(ω)(X (ω)) (16)
for x R. Then their α-cuts are given respec-
tively by
˜
ϕ(
˜
X)
α
= [ϕ(
˜
X
α
),ϕ(
˜
X
+
α
)] and
˜
ρ(
˜
X)
α
=
[ρ(
˜
X
+
α
),ρ(
˜
X
α
)], and the extended measure
˜
ρ(·) has
the following properties similarly to Definition 2
(Yoshida, 2008).
Lemma 3.
˜
ρ(·) is monotonically decreasing, pos-
itively homogeneous, translation invariant and sub-
additive.
In the latter sections we use a coherent risk mea-
sure ρ in Lemma 1 and its extension
˜
ρ in (16) to esti-
mate risks in a financial model. We also need defuzzi-
fication methods. A defuzzification of a fuzzy number
˜n N with a θ-mean and an evaluation weight w(α)
is given by
E
θ
( ˜n) =
Z
1
0
(θ · ˜n
α
+ (1 θ) · ˜n
+
α
)w(α)dα
Z
1
0
w(α)dα
, (17)
where ˜n
α
= [ ˜n
α
, ˜n
+
α
]. Here θ is called decision
maker’s pessimistic index if θ = 1, and it is also called
the optimistic index if θ = 0. w(α) is called the pos-
sibility evaluation if w(α) = 1 for α [0, 1], and it is
also called the necessity evaluation if w(α) = 1 α
for α [0,1] (Yoshida, 2006, 2008). Then E
θ
(·) has
the following properties.
Lemma 4. For θ [0, 1], E
θ
(·) is positively homoge-
neous, additive and monotonically increasing.
The randomness of fuzzy random variables is
evaluated by probabilistic expectation, and its fuzzi-
ness is estimated by the θ-mean and the weight w(α)
as follows: For a fuzzy random variable
˜
X
˜
X , the
mean of the expectation E(E
θ
(
˜
X)) is a real number
E(E
θ
(
˜
X)) = E
Z
1
0
(θ ·
˜
X
α
+ (1 θ) ·
˜
X
+
α
)w(α)dα
Z
1
0
w(α)dα
.
(18)
From Lemma 4, we obtain the following results
(Yoshida, 2006, 2007).
Lemma 5. For θ [0, 1], E(E
θ
(·)) is positively homo-
geneous, additive and monotonically increasing, and
it has the following properties (i) and (ii):
(i) E(E
θ
(·)) = E
θ
(
˜
E(·)).
(ii) E(E
θ
( ˜n)) = E
θ
( ˜n) and E(E
θ
(X)) = E(X) for ˜n
N and X X .
Let
˜
X
a
be a family of fuzzy random variables
˜
X
˜
X for which there exist a random variable X X and
a fuzzy number ˜n N such that
˜
X(ω)(x) = 1
{X(ω)}
(x) + ˜n(x) (19)
for ω and x R, where 1
{·}
denotes the charac-
teristic function of a singleton. Then we can easily
check the following proposition for the weighted av-
erage value-at-risks AVaR
λ
p
, coherent risk measures ρ
and their extensions
^
AVaR
λ
p
and
˜
ρ (Yoshida, 2008).
Proposition 1. For θ [0, 1], it holds that
E
θ
(
^
AVaR
λ
p
(
˜
X)) = AVaR
λ
p
(E
θ
(
˜
X)), (20)
E
1θ
(
˜
ρ(
˜
X)) = ρ(E
θ
(
˜
X)) (21)
for fuzzy random variables
˜
X
˜
X
a
.
4 RISK ALLOCATION WITH
COHERENT RISK MEASURES
Let a state space by S = R and an action space by
A = {(x
1
,x
2
,···,x
n
) R
n
|
n
i=1
x
i
= 1 and x
i
0 (i =
1,2,···,n)}, where n is a positive integer. In this
paper we focus on risk-sensitive expected rewards
to choose alternatives consisting of n assets. Let a
positive integer T be a terminal time, and let time
t = 1, 2, ··· ,T . Let
˜
X
i
t
(
˜
X ) be a fuzzy reward for as-
set i (= 1,2, ··· ,n). Hence we put their expectations
and covariances respectively by µ
i
t
= E
θ
(
˜
E(
˜
X
i
t
)) =
E(E
θ
(
˜
X
i
t
)) and σ
i j
t
= E((E
θ
(
˜
X
i
t
) µ
i
t
)(E
θ
(
˜
X
j
t
) µ
j
t
))
for i, j = 1, 2, ··· ,n. We give Markov policies by
π = {π
t
}
T
t=1
where mappings π
t
= (π
1
t
,π
2
t
,···,π
n
t
) :
7→A for t = 1, 2,··· ,T , and then π
t
is called a strat-
egy. They are chosen depending only on the current
FCTA 2019 - 11th International Conference on Fuzzy Computation Theory and Applications
272
state X
π
t1
. Put a collection of all Markov policies by
Π. A reward with a strategy π
t
= (π
1
t
,π
2
t
,···,π
n
t
) is
given by
˜
X
π
t
=
n
i=1
π
i
t
˜
X
i
t
. (22)
Let a probability p (0,1) and let a positive con-
stant δ. Let f be a C
2
-class risk averse utility function
which is given in Section 2. While let ρ be a coherent
risk measure for risk constraints. Let β be a positive
constant. Hence we focus on the following optimiza-
tion problem with (13), (15) and (16).
Problem (P1). Maximize the risk-sensitive estima-
tion
T
t=1
β
t1
f
1
(E( f (E
θ
(
˜
X
π
t
)))) (23)
with respect to strategies π
t
Π under risk constraint
E
1θ
(
˜
ρ(
˜
X
π
t
)) δ (24)
for time t = 1, 2, ··· ,T .
From the results of Lemma 2, f
1
(E( f (·))) =
f
1
(
R
1
0
VaR
q
( f (·))dq) is approximated by AVaR
λ
1
(·)
with a risk spectrum λ. While by Lemma 1 there ex-
ists a risk spectrum ν such that ρ = AVaR
ν
p
. Hence
we estimate the downside risks on (0, p). By Proposi-
tion 1 this paper discusses the following optimization
instead of Problem (P1).
Problem (P2). Maximize the risk-sensitive estima-
tion
T
t=1
β
t1
AVaR
λ
1
(E
θ
(
˜
X
π
t
)) (25)
with respect to strategies π
t
Π under risk constraint
AVaR
ν
p
(E
θ
(
˜
X
π
t
)) δ (26)
for time t = 1,2,··· , T .
In (25) and (26), risk spectra λ and ν are different
in general, however we can select same risk spectrum,
i.e. λ = ν. Hence from (22) the expectation and the
standard deviation of reward
˜
X
π
t
are
E(E
θ
(
˜
X
π
t
)) =
n
i=1
π
i
t
µ
i
t
(27)
and
σ(E
θ
(
˜
X
π
t
)) =
s
n
i=1
n
j=1
π
i
t
π
j
t
σ
i j
t
. (28)
Together with (7), we also have weighted average
value-at-risk
AVaR
ν
p
(E
θ
(
˜
X
π
t
)) =
n
i=1
π
i
t
µ
i
t
+ κ
ν
(p)
s
n
i=1
n
j=1
π
i
t
π
j
t
σ
i j
t
,
(29)
where
κ
ν
(p) =
Z
p
0
κ(q)ν(q)dq
Z
p
0
ν(q)dq. (30)
In this paper we assume κ
λ
(1) 0 and κ
ν
(p) < 0.
Let Π
t
(δ) be the collection of strategies π
t
satisfy-
ing risk constraint (26), and let Π
t
= sup
δ>0
Π
t
(δ). In
the rest of this section we investigate the lower bound
of AVaR
ν
p
(E
θ
(
˜
X
π
t
)) for the feasibility of constraint
(26) in Problem (P2), i.e. Π
t
(δ) 6=
/
0. From (29),
we firstly discuss the following maximization prob-
lem for AVaR
ν
p
(E
θ
(
˜
X
π
t
)).
Problem (P3). Maximize weighted average value-at-
risk
AVaR
ν
p
(E
θ
(
˜
X
π
t
)) =
n
i=1
π
i
t
µ
i
t
+ κ
ν
(p)
s
n
i=1
n
j=1
π
i
t
π
j
t
σ
i j
t
(31)
with respect to strategies π
t
= (π
1
t
,π
2
t
,···,π
n
t
).
Let γ R. From (27), under a constraint
E(E
θ
(
˜
X
π
t
)) =
n
i=1
π
i
t
µ
i
t
= γ, (32)
Problem (P3) is solved by quadratic programming and
then the corresponding value (31) is
γ + κ
ν
(p)
s
A
t
γ
2
2B
t
γ +C
t
t
, (33)
where
µ=
µ
1
t
µ
2
t
.
.
.
µ
n
t
,Σ=
σ
11
t
σ
12
t
··· σ
1n
t
σ
21
t
σ
22
t
··· σ
2n
t
.
.
.
.
.
.
.
.
.
.
.
.
σ
n1
t
σ
n2
t
··· σ
nn
t
,1
1
1=
1
1
.
.
.
1
,
A
t
= 1
1
1
T
Σ
1
t
1
1
1,B
t
= 1
1
1
T
Σ
1
t
µ
t
,C
t
= µ
T
t
Σ
1
t
µ
t
,
t
= A
t
C
t
B
2
t
and T denotes the transpose of a vector. If A
t
>
0,
t
> 0 and κ
ν
(p) <
p
t
/A
t
are satisfied, we
can easily check the real-valued function (19) of γ
is concave and it has the maximum
B
t
A
t
κ
ν
(p)
2
t
A
t
at γ =
B
t
A
t
+
t
A
t
A
t
κ
ν
(p)
2
t
. Since sup
π
t
Π
(31) =
sup
γ
{sup
π
t
Π:
n
i=1
π
i
t
µ
i
t
=γ
(31)}, we obtain the follow-
ing analytical solutions for Problem (P3).
Theorem 1. Let A
t
> 0,
t
> 0 and κ
ν
(p) <
p
t
/A
t
. Then the following (i) and (ii) hold.
(i) The maximum weighted average value-at-risk of
Problem (P3) is
B
t
p
A
t
κ
ν
(p)
2
t
A
t
(34)
Risk-sensitive Markov Decision Processes with Risk Constraints of Coherent Risk Measures in Fuzzy and Stochastic Environment
273
at the expected reward
γ =
B
t
A
t
+
t
A
t
p
A
t
κ
ν
(p)
2
t
. (35)
The corresponding strategy is given by
π
t
= ξ
t
Σ
1
t
1
1
1 + η
t
Σ
1
t
µ
t
(36)
if π
t
0
0
0, where ξ
t
=
C
t
B
t
γ
t
and η
t
=
A
t
γB
t
t
.
(ii) If Σ
1
t
1
1
1 0
0
0, Σ
1
t
µ
t
0
0
0 and κ
ν
(p) C
t
, then the
strategy (36) satisfies π
t
0
0
0.
5 RISK-SENSITIVE REWARD
MAXIMIZATION UNDER
FEASIBLE RISK
CONSTRAINTS
Let p (0,1) be a probability and let ν be a risk spec-
trum which are given in Section 4. From Theorem 1,
we define the lower bound of AVaR
ν
p
(E
θ
(
˜
X
π
t
)) by a
constant δ
t
(p):
δ
t
(p) = inf
π
t
Π
t
(AVaR
ν
p
(E
θ
(
˜
X
π
t
)))
= sup
π
t
Π
t
AVaR
ν
p
(E
θ
(
˜
X
π
t
))
=
B
t
A
t
+
p
A
t
κ
ν
(p)
2
t
A
t
. (37)
Thus the feasible range of δ in risk constraint (26) is
{δ |Π
t
(δ) 6=
/
0}= [δ
t
(p),). Now we take a risk level
δ [δ
t
(p),), and then we have sup
π
t
Π
t
(δ)
(31) =
sup
γ
{sup
π
t
Π
t
(δ):
n
i=1
π
i
t
µ
i
t
=γ
(31)}. Thus, from the view
point of (33), Problem (P2) is reduced to the following
problem with constraint (32), i.e.
n
i=1
π
i
t
µ
i
t
= γ.
Problem (P4). Maximize the risk-sensitive estima-
tion
γ
t
+ κ
λ
(1)
s
A
t
γ
2
t
2B
t
γ
t
+C
t
t
(38)
with respect to γ R under risk constraint
γ
t
+ κ
ν
(p)
s
A
t
γ
2
t
2B
t
γ
t
+C
t
t
δ. (39)
Hence (39) is equivalent to γ
t
[γ
t
,γ
+
t
], where
γ
±
t
=
B
t
κ
ν
(p)
2
+
t
δ
A
t
κ
ν
(p)
2
t
t
κ
ν
(p)
p
A
t
δ
2
+ 2B
t
δ +C
t
κ
ν
(p)
2
A
t
κ
ν
(p)
2
t
. (40)
By solving concave maximization (38) within con-
straint [γ
t
,γ
+
t
] in Problem (P4), we easily obtain the
following results for Problem (P2).
Theorem 2. Let A
t
> 0,
t
> 0, κ
ν
(p) κ
λ
(1)
0 and κ
ν
(p) <
p
t
/A
t
. Then the maximum risk-
sensitive estimation in Problem (P2) is
ϕ
t
=
B
t
A
t
p
A
t
κ
λ
(1)
2
t
A
t
at γ
t
=
B
t
A
t
+
t
A
t
p
A
t
κ
λ
(1)
2
t
if δ
+
t
δ and κ
λ
(1) <
p
t
/A
t
,
γ
+
t
κ
λ
(1)
κ
ν
(p)
(δ + γ
+
t
)
at γ
t
= γ
+
t
otherwise,
(41)
where δ
+
t
=
B
t
A
t
+
A
t
κ
λ
(1)κ
ν
(p)
t
A
t
p
A
t
κ
λ
(1)
2
t
.
6 DYNAMIC MAXIMUM
RISK-SENSITIVE REWARD
UNDER FEASIBLE RISK
CONSTRAINTS
Let the initial state be a real number X
π
0
= x
0
. Then
E(X
0
) = γ
0
= x
0
and σ(X
0
)
2
= 0. For a Markov pol-
icy π = {π
t
}
T
t=1
Π, the expectation and the standard
deviation of terminal rewards X
π
T
= x
0
+
T
t=1
R
π
t
are
Problem (P5). Maximize the total risk-sensitive ex-
pected immediate reward
T
t=1
β
t1
γ
t
+ κ
λ
(1)
s
A
t
γ
2
t
2B
t
γ
t
+C
t
t
(42)
with respect to (γ
1
,γ
2
,···γ
T
) R
T
under risk con-
straint
γ
t
[γ
t
,γ
+
t
] (43)
for all t = 1,2,··· , T .
Lemma 6. Let {v
t
} be a sequence given by the fol-
lowing optimality equations
v
t
= sup
γ
t
[γ
t
,γ
+
t
]
γ
t
+ κ
λ
(1)
s
A
t
γ
2
t
2B
t
γ
t
+C
t
t
+βv
t+1
(44)
for t = 1,2,··· , T and v
T +1
= 0. Then v
1
is the max-
imum total risk-sensitive expected immediate reward
for Problem (P5).
FCTA 2019 - 11th International Conference on Fuzzy Computation Theory and Applications
274
From Theorem 2, we have the following results.
Theorem 3. Let A
t
> 0,
t
> 0, κ
ν
(p) κ
λ
(1) 0
and κ
ν
(p) <
p
t
/A
t
for t = 1,2,··· , T .
(i) Let {v
t
} be a sequence given by the following op-
timality equations
v
t
= ϕ
t
+ β v
t+1
(45)
for t = 1,2,··· , T and v
T +1
= 0. Then v
1
is the
maximum of the total risk-sensitive expected re-
wards in Problem (P5).
(ii) Further the optimal portfolios of (44) in Lemma
6 are given by
w
t
= ξ
t
Σ
1
t
1
1
1 + η
t
Σ
1
t
µ
t
(46)
for t = 1, 2, ··· ,T , where γ
t
is given by (41), ξ
t
=
C
t
B
t
γ
t
t
and η
t
=
A
t
γ
t
B
t
t
.
(iii) Further, one of sufficient condition for w
t
0
0
0
is the followings: κ
λ
(1)
2
C
t
, C
t
κ
ν
(p)
2
(C
t
+
B
t
δ)
2
, A
t
κ
ν
(p)
2
(A
t
δ + B
t
)
2
, Σ
1
t
1
1
1 0
0
0 and
Σ
1
t
µ
t
0
0
0 for t = 1,2,...,T .
7 NUMERICAL EXAMPLES
We give a few examples to understand the results in
the previous sections.
Example 1. Let a domain I = R and let f be a risk
neutral utility function f (x) = ax + b for x R with
constants a(> 0) and b( R). Then its risk spectrum
in Lemma 2 is given by λ(p) = 1. The corresponding
weighted average value-at-risk (4) is reduced to the
average value-at-risk (3), and we have
f
1
(E( f (X))) = E(X) = AVaR
1
(X) (47)
for X X (Yoshida, 2018).
Example 2. Let a domain I = R and let a risk averse
exponential utility function
f (x) =
1 e
τx
τ
(48)
for x R with a positive constant τ. Then
f
00
f
0
= τ is
the degree of decision maker’s absolute risk aversity
(Arrow, 1971). Fig.1 illustrates utility functions f (x).
Let X be a family of random variables X which have
normal distribution functions. Define the cumulative
distribution function G : R (0,1) of the standard
normal distribution by
G(x) =
1
2π
Z
x
e
z
2
2
dz (49)
for x R, and define an increasing function κ :
(0,1) 7→ R by its inverse function κ(p) = G
1
(p) for
probabilities p (0,1). Then we have value-at-risk
VaR
p
(X) = µ + κ(p) ·σ for X X with mean µ and
standard deviation σ. Suppose there exists a distri-
bution ψ : R × (0, ) 7→ [0,) such that ψ(µ,σ) =
φ(µ) ·
2
1n/2
Γ(n/2)
σ
n1
e
σ
2
2
for (µ,σ) R ×[0,), where
φ(µ) is some probability distribution, Γ(·) is a gamma
function and
2
1n/2
Γ(n/2)
σ
n1
e
σ
2
2
is a chi distribution with
degree of freedom n. We take a utility f (x) =
1e
0.05x
0.05
with τ = 0.05 in (48), and by Lemma 2 there exists a
risk spectrum λ satisfying f
1
(E( f (·))) AVaR
λ
1
(·).
Then, by (Yoshida, 2018), the best risk spectrum in
Lemma 2 is given by
λ(p) = e
R
1
p
C(q) dq
C(p) (50)
for p (0, 1] with the component function
C(p)=
1
p
·
Z
0
1
1
1
p
R
p
0
e
τσ(κ(p)κ(q))
dq
!
σ
n
e
σ
2
2
dσ
Z
0
log
1
p
R
p
0
e
τσ(κ(p)κ(q))
dq
σ
n
e
σ
2
2
dσ
.
(51)
with τ = 0.05. From (49) and (50), we have κ
λ
(1) =
R
1
0
κ(q)λ(q)dq
.
R
1
0
λ(q)dq = 0.03. On the other
hand for risk measures ρ we use another utility g(x) =
1e
x
with τ = 1 in (48). Then by Lemma 1 there ex-
ists a risk spectrum ν such that ρ(·) = AVaR
ν
p
(·). We
discuss a case of risk probability 5%, i.e. p = 0.05, in
the normal distribution, and then similarly we can cal-
culate κ
ν
(0.05) =
R
0.05
0
κ(q)ν(q)dq
.
R
0.05
0
ν(q)dq =
2.29701. We give fuzzy rewards by fuzzy random
variables
˜
X
i
t
˜
X
a
(i = 1, 2, . ..,n) as follows:
˜
X
i
t
(ω)(·) = 1
{X
i
t
(ω)}
(·) + ˜a
i
t
(·) (52)
for ω , where X
i
t
( X ) and ˜a
i
t
is a fuzzy num-
ber ˜a
i
t
(x) = max{1 |x|/c
i
t
,0} for x R with a
positive number c
i
t
, which is a fuzzy factor. Let
n = 4 be the number of assets. Hence we put
the expectations µ
i
t
of rewards X
i
t
and fuzzy fac-
tors c
i
t
by Table 1, and we let the covariances σ
i j
t
of rewards X
i
t
by Table 2. We deal with an opti-
mistic and possibility case, i.e. θ = 0 and w(α) =
1 for α [0,1]. Hence we have A
t
= 15.1405 >
0 and
t
= 0.003449 > 0. And we can easily
check κ
ν
(0.05) < κ
λ
(1) <
p
t
/A
t
= 0.0150931.
From (37) we also have δ
t
(p) = 0.495737. There-
fore now we take a risk level δ = 0.55 in the fea-
sible range [0.495737,). From Theorem 2, we
obtain the maximum risk-sensitive estimation ϕ
t
=
Risk-sensitive Markov Decision Processes with Risk Constraints of Coherent Risk Measures in Fuzzy and Stochastic Environment
275
Table 1: Expectations µ
i
t
and fuzzy factors c
i
t
.
µ
i
t
c
i
t
i = 1 0.098 0.008
i = 2 0.084 0.008
i = 3 0.091 0.007
i = 4 0.088 0.006
Table 2: Variance-covariances σ
i j
t
.
σ
i j
t
j = 1 j = 2 j = 3 j = 4
i = 1 0.38 0.09 0.07 0.05
i = 2 0.09 0.39 0.08 0.06
i = 3 0.07 0.08 0.38 0.06
i = 4 0.05 0.06 0.06 0.37
0.0879143 at the expected reward γ
t
= 0.0968356
for Problem (P2) with an optimal strategy π
t
=
(0.453443,0.159892,0.313565,0.0731002). Hence
the difference between the real expected reward γ
t
=
0.0968356 and the decision maker’s maximum risk-
sensitive estimation ϕ
t
= 0.0879143 comes from de-
cision maker’s risk averse feeling.
1
1 2 3 4 5
1.0
0.5
0.5
1.0
0
0
x
Example 1
Example 2
Example 1
Example 2
Example 2
Example 2
!"
!#
!"
!#
x
$!!%
f
&
&
&
&
Figure 1: Utility functions f (x).
We can also use a pessimistic and necessity case,
i.e. θ = 1 and w(α) = 1 α for α [0,1]. From
Tables 1 and 2, we find the maximum risk-sensitive
estimation is in 0.0785810 ϕ
t
0.0874494, and the
expected reward is in 0.0875022 γ
t
0.0968356.
Fig.2 illustrates the maximum risk-sensitive es-
timation ϕ
t
for δ in Theorem 2, and we see the
two lines are cut and connected at δ
+
t
. Fig.3 illus-
trates the maximum risk-sensitive estimation ϕ
t
and
the expected reward γ
t
for δ. We see ϕ
t
is smaller
than γ
t
because γ
t
implies actual expected rewards
and ϕ
t
contains decision maker’s risk aversity un-
der his utility. Fig.4 also shows the feasible range
{(p,δ) |Π
t
(δ) 6=
/
0}= {(p, δ) |δ
t
(p) δ} in Example
2 (τ = 1).
We discuss a dynamic case with an expiration date
T = 20 and a discount rate β = 0.95. Then by The-
Table 3: Risk-sensitive estimation ϕ
t
and expected reward
γ
t
.
Pess. & Nec. Opti. & Poss.
γ
t
0.0875022 0.0968356
ϕ
t
0.0785810 0.0879143
G
( )
G
p
G
+
M
*
Max risk-sensitive expectation
t
t
t
0.55 0.60 0.65 0.70
0.086
0.087
0.088
0.089
0.090
Figure 2: The max. risk-sensitive estimation ϕ
t
.
G
J
*
M
*
Max risk-sensitive expectation
Expected reward
( )
G
p
G
+
t
t
t
t
0.55
0.60
0.65
0.70
0.085
0.090
0.095
0.100
Figure 3: The max. risk-sensitive estimation ϕ
t
and the ex-
pected reward γ
t
.
orem 3 we obtain the optimal total weighted average
value-at-risk v
1
= 1.67037 for Problem (P4) and we
can observe the sequence {v
t
} of sub-total weighted
average value-at-risks after time t in Theorem 3.
Concluding Remarks. Using Lemma 2, we can in-
corporate the decision maker’s risk averse attitude
into coherent risk measures as weighting for average
value-at-risks. As we have seen in Examples 1 and 2,
risk-sensitive estimations with utility f are approxi-
mated by weighted average risks with a spectrum λ in
(50) and (51), and the coherent risk measures ρ with
fuzzy factors is given by weighted average risks with
a spectrum ν. The proposed method brings us reason-
able and computable risk-sensitive optimization mod-
els under risk constraints, and it is useful for other
subjective optimization in management sciences. We
can reply immediately risk values ρ = AVaR
λ
p
from
(7) when we prepare constants (8), and this approach
FCTA 2019 - 11th International Conference on Fuzzy Computation Theory and Applications
276
Figure 4: Feasible region for risk levels δ.
Example 1
Example 2
W

t
0
5
10
15
20
0.5
1.0
1.5
Figure 5: Sequences {v
t
} for Example 1 and Example 2
(τ = 1) (p = 0.05).
will be applicable to timely and quick risk-sensitive
decision making together with AI computing, for ex-
ample, stock trading, auto driving and so on (Yoshida,
to appear).
ACKNOWLEDGEMENTS
This research is supported from JSPS KAKENHI
Grant Number JP 16K05282.
REFERENCES
Acerbi, C., Spectral measures of risk: A coherent represen-
tation of subjective risk aversion, Journal of Banking
& Finance, vol.26, pp.1505-1518, 2002.
Adam, A., Houkari, M., Laurent, J.-P., Spectral risk mea-
sures and portfolio selection, Journal of Banking &
Finance, vol.32, pp.1870-1882, 2008.
Arrow, K.J., Essays in the Theory of Risk-Bearing,
Markham, Chicago, 1971.
Artzner, P., Delbaen, F., Eber, J.-M., Heath, D., Coher-
ent measures of risk, Mathematical Finance, vol.9,
pp.203-228, 1999.
B
¨
auerle, N., Rieder, U., More risk-sensitive Markov deci-
sion processes, Math. Oper. Res., vol.39, pp.105-120,
2014.
Howard, R., Matheson, J., Risk-sensitive Markov decision
processes, Management Science, vol.18, pp.356-369,
1972.
Jorion, P., Value at Risk: The New Benchmark for Managing
Financial Risk, McGraw-Hill, New York, 2006.
Kruse, R., Meyer, K.D., Statistics with Vague Data, Riedel
Publ. Co., Dortrecht, 1987.
Kusuoka, S., On law-invariant coherent risk measures, Ad-
vances in Mathematical Economics, vol.3, pp.83-95,
2001.
Kwakernaak, H., Fuzzy random variables I. Definitions
and theorem, Inform. Sci., vol.15, pp.1-29, 1978.
Rockafellar, R.T., Uryasev, S., Optimization of conditional
value-at-risk, J. of Risk, vol.2, pp.21-41, 2000.
Rockafellar, R.T., Uryasev, S., Conditional value-at-risk for
general loss distribution functions, J. of Banking and
Finance, vol.26, pp.1443-1471, 2002.
Tasche, D., Expected shortfall and beyond, J. of Banking
and Finance, vol.26, pp.1519-1533, 2002.
Yoshida, Y., Mean values, measurement of fuzziness and
variance of fuzzy random variables for fuzzy opti-
mization, Proceedings of SCIS & ISIS 2006, pp.2277-
2282, Sept. 2006.
Yoshida, Y., Fuzzy extension of estimations with random-
ness: The perception-based approach, in: P.Melin
et al. eds., IFSA2007, LNAI 4617, pp.381-391,
Springer, Sept. 2007.
Yoshida, Y., Perception-based estimations of fuzzy random
variables: Linearity and convexity, International Jour-
nal of Uncertainty, Fuzziness and Knowledge-Based
Systems, vol.16, suppl., pp.71-87, 2008.
Yoshida, Y., Maximization of returns under an average
value-at-risk constraint in fuzzy asset management,
Procedia Computer Science, Vol.112, pp.11-20, 2017.
Yoshida, Y., Coherent risk measures derived from utility
functions, in: V.Torra and Y.Narukawa eds., Model-
ing Decisions for Artificial Intelligence - MDAI 2018,
LNAI 11144, Springer, pp.15-26, 2018.
Yoshida, Y., Portfolio optimization in fuzzy asset manage-
ment with coherent risk measures derived from risk
averse utility, Neural Computing and Applications, to
appear, https://doi.org/10.1007/s00521-018-3683-y.
Zadeh, L.A., Fuzzy sets, Inform. and Control, vol.8,
pp.338-353, 1965.
Risk-sensitive Markov Decision Processes with Risk Constraints of Coherent Risk Measures in Fuzzy and Stochastic Environment
277