Optimal Policies for Payment of Dividends through a Fixed Barrier at
Discrete Time
Ra
´
ul Montes-de-Oca
1
, Patricia Saavedra
1
, Gabriel Zacar
´
ıas-Espinoza
1
and Daniel Cruz-Su
´
arez
2
1
Departamento de Matem
´
aticas, Universidad Aut
´
onoma Metropolitana-Iztapalapa,
Av. San Rafael Atlixco 186, Col. Vicentina, Cd. de M
´
exico 09340, Mexico
2
Divisi
´
on Acad
´
emica de Ciencias B
´
asicas, Universidad Ju
´
arez Aut
´
onoma de Tabasco,
Km 1 Carr. Cunduac
´
an-Jalpa, Cunduac
´
an, Tabasco 86690, Mexico
Keywords:
Reserve Processes, Discounted Markov Decision Processes, Ruin Probability, Optimal Premiums, Dividends.
Abstract:
In this paper a discrete-time reserve process with a fixed barrier is presented and modelled as a discounted
Markov Decision Process. The non-payment of dividends is penalized. The minimization of this penalty
results in an optimal control problem. This work focuses on determining the sequence of premiums that mini-
mize penalty costs, and obtaining a rate for the probability of ruin to ensure a sustainable reserve operation.
1 INTRODUCTION
This work is related to risk theory, which describes
the behavior of the reserve process of an insurance
company. The classic model was introduced by Filip
Lundberg in 1903 (Lundberg, 1909) and developed
by Harald Cram
´
er in 1930 (Cram
´
er, 1930). In this
model, the premiums are obtained continuously at a
constant rate and the total amount of claims over a
period of time t is given by a compound Poisson pro-
cess. The main problem of the classical model was
to determine the ruin probability of the reserve pro-
cess. However, currently, several other interesting
problems have been matter of study: minimization
of the ruin probability, the distribution of dividends
to shareholders, the reinsurance problem, the collec-
tion of premiums dependent on the history of each
customer, analysis of the reserve process when claims
have sub-exponential distributions, just to mention a
few (see (Azcue and Muler, 2014), (Dickson, 2005),
(Dickson and Waters, 2004), (Gerber, 1981), (Ger-
ber et al., 2006), (Rolski et al., 1999), and (Schmidli,
2009)).
In particular, the problem of interest for the au-
thors of this article is the definition of policies for the
distribution of dividends in fixed periods of time when
the claims are of light or heavy tails. This issue is rel-
evant because in the classical model, if the intensity of
the premiums is higher than the average total amount
of claims (the security loading is positive), then with
probability 1, the paths of the reserve tend to infin-
ity when the time t increases indefinitely, (see (De-
Finetti, 1957)). Therefore, dividends appear as a way
to control an unlimited increment of the reserves.
Dividend policies aim to attract shareholders (or
investors), in order to address risks. One possi-
ble policy is to determine the dividend strategy that
maximizes the discounted expected value of a utility
function by means of control techniques. This ap-
proach has been studied in continuous time such as:
(Azcue and Muler, 2014), (Dickson, 2005), (Dick-
son and Waters, 2004), (Gerber, 1981), (Gerber et al.,
2006), and (Schmidli, 2009). On the other hand,
discrete-time problems of risk theory have been stud-
ied, for instance, in (Bulinskaya and Muromskaya,
2014), (Diasparra and Romera, 2009), (Mart
´
ınez-
Morales, 1991), (Martin-L
¨
of, 1994), (Sch
¨
al, 2004),
and (Schmidli, 2009) who have applied the optimal
control theory in insurance companies. In particular,
in (Martin-L
¨
of, 1994) the control techniques were in-
troduced for the first time by means of the theory of
discounted Markov Decision Processes.
The discounted Markov Decision Processes
(MDPs) (see (Hern
´
andez-Lerma and Lasserre, 1996))
at discrete time are those that are periodically ob-
served under uncertainty on transit of their states and
with the property that they can be influenced by ap-
plication of controls (Hern
´
andez-Lerma and Lasserre,
1996). A Markov Decision Process (MDP) is gener-
ally described as follows: at a particular time n, the
system is observed and, depending on its current state,
a control is applied; then a cost is paid and, by a prede-
140
Montes-de-Oca R., Saavedra P., Zacarà as-Espinoza G. and Cruz-Suà ˛arez D.
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time.
DOI: 10.5220/0006193701400149
In Proceedings of the 6th International Conference on Operations Research and Enterprise Systems (ICORES 2017), pages 140-149
ISBN: 978-989-758-218-9
Copyright
c
2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
termined transition law, the system gets to a new state.
The sequence of controls is called policy, and a way of
assessing their quality is through a performance crite-
rion. The Optimal Control Problem (OCP) consists
in determining a policy which optimizes the perfor-
mance criterion. One way to solve the OCP is using
the technique of dynamic programming introduced by
Bellman in the middle of the last century.
From this perspective, the problem of dividends
is modeled here by using discrete-time MDPs. It is
proposed to work within MDPs since similar con-
trol problems of dams or inventories, sample stor-
age problems, have been resolved successfully, see
(Finch, 1960) and (Ghosal, 1970). On the other hand,
discrete-time is used here as it was suggested in (Li
et al., 2009). This type of analysis is important in it-
self as it presents an approximation of the continuous
problem and as it is also more realistic from the ap-
plications point of view. One approach that will be
followed in this work is to study the problem of div-
idends by fixing an objective capital, (barrier) Z > 0.
If the reserve exceeds Z, then the dividends are dis-
tributed. A model with a fixed barrier reserve of an
insurance company is proposed. The reserve process
is modelled as an MDP whose admissible control be-
longs to a compact subset. The bounds of this sub-
set depend on two principles for premium calculation:
the expectation principle and the standard deviation
principle (see (Dickson, 2005)). The distribution of
the total amount of claims, by time interval, repre-
sents a compound process which is supposed to be
general, in the sense that it only requires for its den-
sity to be continuous almost everywhere. The pro-
posed performance criterion is the expected total dis-
counted cost, where the cost penalizes both the fail-
ure to pay dividends and the difference between the
admissible premiums and a constant which depends
on the standard deviation principle to premium calcu-
lation. In addition, the dynamic programming tech-
nique explicitly determines the optimal solutions, and
on the other hand, a rate for the ruin probability is
established, which aims to determine long periods of
sustainability of the company.
The paper is organized as follows: in the sec-
ond section the mathematical tools that will be used
throughout this work (mainly MDPs and stochastic
orders) are presented. The reserve process with a
fixed barrier is presented in the third section with an
analysis of dividend policies. In the fourth and fifth
sections the main results are given: the optimal pre-
mium and a rate for the ruin probability with a couple
of examples where the theory obtained in this work is
applied. Finally, research conclusions are presented.
2 PRELIMINARIES
This section presents some results on the theory that
will be used to solve the problem stated in the paper.
2.1 Stochastic Orders
Let X be a Borel space (i.e., a Borel subset of a separa-
ble metric space) and suppose that X is complete and
partially ordered. The partial order in X is denoted by
. Moreover a function g : X R is considered to be
increasing if x, y X, x y, imply that g(x) g(y),
where is the usual order in R. Besides, the Borel
σ-algebra of X is denoted by B(X).
Definition 2.1. Let X be a complete Borel space and
suppose that X is partially ordered. Let P and P
0
be
probability measures on (X,B (X )). It is said that P
0
dominates P stochastically if
R
gdP
R
gdP
0
for all
g : X R measurable, bounded and increasing, so
write P
st
P when this holds.
Remark 2.2. Let P and P
0
be probability measures on
(R,B(R)). In this case, P
st
P
0
if F
0
(x) F(x), for all
x R, where F and F
0
are the distribution functions of
P and P
0
, respectively, (see (Lindvall, 1992) p. 127).
Lemma 2.3. ((Cruz-Su
´
arez et al., 2004), Lemma 2.6)
Let X be a complete Borel space, and suppose also
that X is partially ordered. Let P and P
0
be proba-
bility measures on (X, B (X)), such that, P
st
P
0
. Then
R
H
dP
R
H
dP
0
, for H
: X R which is measur-
able, nonnegative, nondecreasing, and (possibly) un-
bounded.
2.2 Discounted Markov Decision
Processes
Let X and Y be complete Borel spaces. A stochas-
tic kernel on X given Y is a function P(·|·) such that
P(·|y) is a probability measure on X for each fixed
y Y, and P(B) is a measurable function on Y for
each fixed B B(X).
Let (X ,A, {A(x)|x X}, Q,c) be a discrete-time
Markov Control Model (see (B
¨
auerle and Rieder,
2011) or (Hern
´
andez-Lerma and Lasserre, 1996) for
notation and terminology). This model consists of the
state space X, the control set A, the transition law Q,
and the cost-per-stage c. For each x X, there is a
nonempty measurable set A(x) A whose elements
are the feasible actions when the state of the system
is x. Define K := {(x,a) : x X, a A(x)} . c is as-
sumed to be a nonnegative and measurable function
on K.
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time
141
The transition law Q is often induced by an equa-
tion of the form
x
n+1
= G(x
n
,a
n
,ξ
n
), (1)
n = 0,1,··· , with x
0
X given, where {x
n
} and {a
n
}
are the sequences of the states and controls, respec-
tively, and{ξ
n
} is a sequence of random variables in-
dependent and identically distributed (i.i.d.), with val-
ues in some space S, common density function , and
independent of the initial state x
0
; G : K ×S X is a
measurable function.
Assumption 2.4. (a) A(x) is compact for all x X;
(b) c is lower semicontinuous and nonnegative;
(c) The transition law Q is strongly continuous, that
is, the function h
0
, defined on K by:
h
0
(x,a) :=
Z
h(y)Q(dy|x, a), (2)
is continuous and bounded for every measurable
bounded function h on X.
Using the standard notation and definitions in
(Hern
´
andez-Lerma and Lasserre, 1996), Π denotes
the set of all policies and F is the subset of station-
ary policies. Each stationary policy f F is identi-
fied with the measurable function f : X A such that
f (x) A(x) for every x X.
Remark 2.5. Given an initial state x X and a sta-
tionary policy f F, the process determined by (1) is
a homogeneous Markov process with transition kernel
Q(·|x, f ) (see (Hern
´
andez-Lerma and Lasserre, 1996)
Proposition 2.3.5 p. 19).
Let (X ,A, {A(x)|x X}, Q, c) be a discrete-time
Markov Control Model, in this paper the perfor-
mance criterion to consider is the Expected Total
Discounted Cost defined as
v(π,x) := E
π
x
[
+
n=0
α
n
c(x
n
,a
n
)], (3)
when using the policy π Π, given the initial state
x
0
= x X . In this case, α (0, 1) is a given discount
factor, and E
π
x
denotes the expectation with respect to
the probability measure P
π
x
induced by π and x (see
(Hern
´
andez-Lerma and Lasserre, 1996)).
A policy π
is said to be optimal if
v(π
,x) = V
(x), (4)
for each x X , where
V
(·) := inf
πΠ
v(π,·) (5)
is the so-called optimal value function.
Remark 2.6. Assumptions 2.4a) and 2.4b) imply that
c is inf-compact on K, that is, for every x X and
r R, the set
A
r
(x) := {a A(x)|c(x,a) r} (6)
is compact. Therefore, Assumption 2.4 implies As-
sumption 1a) and 1b) in (Hern
´
andez-Lerma and
Lasserre, 1996). Consequently, the validity of the next
lemma is guaranteed.
Lemma 2.7. ((Hern
´
andez-Lerma and Lasserre,
1996), Theorem 4.2.3 and Lemma 4.2.8) Under As-
sumption 2.4,
(a) The optimal value function V
satisfies the opti-
mality equation
V
(x) = inf
aA(x)
{c(x,a) + α
Z
V
(y)Q(dy|x, a)},
(7)
for each x X.
(b) There exists an optimal stationary policy f
F
such that
V
(x) = c(x, f
(x)) + α
Z
V
(y)Q(dy|x, f
(x)),
(8)
for each x X.
(c) V
n
(x) V
(x) when n , where V
n
is defined
by
V
n
(x) = inf
aA(x)
{c(x,a) + α
Z
V
n1
(y)Q(dy|x, a)},
(9)
for each x X, with V
0
(·) = 0.
3 RESERVE PROCESS
A Risk Process (see (Asmussen, 2010), (Dickson,
2005), and (Schmidli, 2009)) consists of a pair
(P
t
,S
t
),t 0, which describes the premiums earned
and the total amount of claims during the period of
time [0,t], respectively.
The relationship between P
t
and S
t
is given as fol-
lows:
R
t
= R
0
+ P
t
S
t
, (10)
t 0, where R
0
= u > 0 is the initial reserve of the
company. In this case, R
t
represents the reserve of the
company at the time t. The process {R
t
}
t0
is called
Reserve Process.
The ruin of the company is given at the instant R
t
takes a negative value. The main objective then is to
determine the probability of this event, which is done
in the following definition.
ICORES 2017 - 6th International Conference on Operations Research and Enterprise Systems
142
Definition 3.1. The ruin probability ψ(u), with initial
reserve u > 0, is defined by
ψ(u) := Pr[τ(u) < +] (11)
where τ(u) := in f {t > 0|R
t
< 0} with τ(u) = + if
R
t
> 0 for all t 0.
In the classical model of Lundberg and Cram
´
er,
the premiums are determined continuously and de-
terministically, i.e., P
t
= Ct where C > 0 and t 0.
In addition, the total amount of claims S
t
may de-
pend on two process: a homogeneous Poisson process
{N(t)}
t0
, with intensity λ > 0, and a claims amounts
process {Y
i
: i = 1,2, ···}, where Y
i
are independent
and identically distributed random variables. Thus,
the total amount of claims until time t is given by
S
t
=
N(t)
i=1
Y
i
, (12)
where S
t
= 0 if t = 0.
Thus, the classical reserve process is described by
R
t
= u +Ct
N(t)
i=1
Y
i
,
= u +Ct S
t
.
Observe that if E[S
t
] denotes the expectation of S
t
,
and E[S
t
] < +, then, taking the expectation in the
last equation, it is obtained that
E[R
t
] = u + (C λE[Y
1
])t. (13)
Choosing C > λE[Y
1
], it is concluded that the av-
erage reserves of the company grow indefinitely. In
other words, the reserve R
t
tends to infinity when t
does so with probability 1 ψ(u). The assumption
C > λE[Y
1
] is known as the Safety Loading Condi-
tion.
As mentioned above, in the classical model, the
safety loading condition allows an insurance company
reserves to accumulate indefinitely, which is unrealis-
tic. Although there seems to be a controversy about
this point, it has been suggested to establish an up-
per limit (barrier) Z for the accumulation or earnings
in order to sustain the risks (see (Azcue and Muler,
2014), (De-Finetti, 1957), (Dickson, 2005), (Dickson
and Waters, 2004), and (Schmidli, 2009)). To reach
this end, the reserves of the company must be reduced
to Z from time to time, for example, by paying divi-
dends to shareholders.
Remark 3.2. It is important to mention that in a more
general setting, some of the assumptions of the clas-
sical model may be relaxed, e.g.,{N(t)} could be a
non-homogeneous Poisson process or a more general
renewal process. Hence it is possible to assume that
the claim size cumulative distribution function is of a
particular parametric form, eg., gamma, Weibull, etc.
(see Assumption 3.5 and examples 1 and 2, below).
Dividends can be understood as payments made
by a company to its shareholders, either in cash or
in shares. The arguments about the advantages of a
dividend refer to the intention of the investors to earn
income in the present and to reduce uncertainty. For-
mally, the dividends, d
t
, are defined as d
t
= [R
t
Z]
+
,
where [z]
+
= max{0, z}.
On the other hand, in the existing literature, differ-
ent methods are proposed to determine the premium
value for the safety loading condition to hold (see
(Dickson, 2005) and (Schmidli, 2009)). In this work
the expectation principle will be used.
3.1 Discrete-time Reserve Process
Now, a discrete-time reserve model will be developed.
The discretization is reasonable because, in practice,
decisions of the company about its operations are
taken at fixed points of time (see (Bulinskaya and
Muromskaya, 2014), (Diasparra and Romera, 2009),
(Li et al., 2009), and (Schmidli, 2009)).
Let {R
t
} be a reserve process with initial reserve
R
0
= u > 0, and {t
n
} be an increasing sequence of
positive real numbers with t
0
= 0. Then, equation (10)
implies that
R
t
n+1
R
t
n
= (P
t
n+1
P
t
n
) (S
t
n+1
S
t
n
), (14)
for n = 0, 1,··· , where (P
t
n+1
P
t
n
) and (S
t
n+1
S
t
n
)
are the premiums earned and the total amount of
claims during the period (t
n
,t
n+1
], respectively.
Let x
t
n
:= R
t
n
, a
t
n
:= (P
t
n+1
P
t
n
) and ξ
t
n
:=
(S
t
n+1
S
t
n
). Then, without loss of generality, it is
possible assume that t
n
= n for n > 0. Then, the
discrete-time reserve model is as follows:
x
n+1
= x
n
+ a
n
ξ
n
, (15)
with x
0
= u > 0.
In this case, x
n+1
represents the reserve at time
t = n +1. Moreover, the discrete-time ruin probability
is determine by
ψ
d
(u) := Pr[τ
d
(u) < +] (16)
where τ
d
(u) := inf{n 1|x
n
0} with τ
d
(u) = + if
x
n
> 0 for all n > 0.
According to the ruin probability defined above,
the ruin of the company is attained when x
n
+ a
n
ξ
n
0 for some n > 0.
If the following dynamics is considered:
x
n+1
= [x
n
+ a
n
ξ
n
]
+
, (17)
for n = 1,2,··· , with x
0
= u > 0, then dynamics
in (17) determines the ruin when x
n
= 0 for some
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time
143
n = 1,2, ··· . However, just as in the continuous case
model, if the safety loading condition holds, E[x
n
]
+ when n +.
Remark 3.3. The dynamics described in (17) is
known as the Lindley random walk (see (Asmussen,
2010)) which has various applications, for example,
in storage processes, waiting time model, queue size
models, to name a few (Asmussen, 2010). (See Re-
mark 3.4, below.)
3.2 Reserve Process with a Fixed
Barrier
This subsection provides a reserve process which is
modelled as a discounted Markov Decision Process
at discrete time. The motivation is originated from
the previous subsection, that is, the possibility of dis-
cretizing the reserve process, and the existence of a
fixed barrier which defines the payments of dividends
(see (Azcue and Muler, 2014), (De-Finetti, 1957),
(Dickson, 2005), and (Mart
´
ınez-Morales, 1991)).
Let Z be a fixed barrier such that, if at time t
n
,
x
n
> Z, the surplus X
n
Z is used to pay dividends.
Thus, the study of the reserve process focuses on the
reserves below barrier Z. Mathematically, this is de-
scribed by the following dynamics:
x
n+1
= min{[x
n
+ a
n
ξ
n
]
+
,Z} (18)
with x
0
= u > 0.
In this case, x
n
, a
n
and ξ
n
denotes respectively:
reserve, premium and the total amount of claims of
the company at the beginning of the period (n,n + 1].
Remark 3.4. The dynamics given in equation (18)
has been used to describe storage processes with finite
capacity such as: dams, inventory, waiting time model
and queue sizes, to name a few (see (Finch, 1960) and
(Ghosal, 1970)).
Assumption 3.5. Suppose that {ξ
n
} is a sequence of
i.i.d. random variables with values on [0,), and a
common distribution F whose density is continu-
ous almost everywhere (a.e.), with E[ξ] < + (ξ is a
generic element of the sequence {ξ
n
}).
In the rest of this paper Assumption 3.5 will not be
mentioned in each result, but it is supposed to hold.
Remark 3.6. Observe that Assumption 3.5 considers
general distributions which, in practice, permits us to
work with distributions with light or heavy tails (see
(Azcue and Muler, 2014)).
Using the expectation principle for premiums cal-
culation, it is ensured that the safety loading condition
for the process described in equation (18) holds. De-
fine
K := (1 +ε)E[ξ] (19)
and
M := (1 + β)E[ξ], (20)
where 0 < ε < β. Then, by ((Dickson, 2005) and
(Schmidli, 2009)) K < M, therefore, the admissible
premiums set is the compact subset [K, M]. (Note that
for all premium a A(x) = [K,M], the safety load-
ing condition is satisfied, and β is fixed in order to be
competitive in the insurance market.)
Every time that the reserve is below the barrier Z,
the non-payments of dividends is penalized. There-
fore, the following cost function is proposed:
c(x,a) := [Z x]
+
, (21)
for each x [0, +) and a [K,M].
Remark 3.7. This model defines an MDP: take X =
[0,+) as the state space; A = [K, M] as the action
space; A(x) = [K,M] as admissible actions for each
x X; the transition law Q is induced by the function
G(x,a,s) := min{[x + a s]
+
,Z} for each (x,a) K
and s [0,+) (see equation (1)). Finally, the cost
function is defined in (21).
According to Remark 3.7, there is a problem (an
OCP) to determine the sequence of premiums π =
{a
n
} which optimizes
v(π,x) := E
π
x
"
+
n=0
α
n
[Z x
n
]
+
#
, (22)
where x 0 is the initial reserve, and α is a given
discount factor.
4 OPTIMAL PREMIUMS
In this section the research results are presented using
MDPs theory.
By the definition of the cost function in (21) it
is concluded that it is nonnegative and continuous.
Moreover, for each x X , A(x) = [K,M] is a compact
set. So, now it is only necessary to show Assumption
2.4c) which is presented in the following lemma.
Lemma 4.1. The transition law Q, induced by (18),
is strongly continuous.
Proof. Let h : X R be a measurable function
bounded by the constant γ. Using the Variable Change
Theorem ((Ash and Dol
´
eans-Dade, 2000) p. 52), it
follows that
Z
h(y)Q(dy|x, a) =
Z
0
h(min{[x+as]
+
,Z})(s)ds,
(23)
(x,a) K.
ICORES 2017 - 6th International Conference on Operations Research and Enterprise Systems
144
Furthermore,
Z
0
h(min{[x + a s]
+
,Z})(s)ds = (24)
h(0)(1 F(x + a)) (25)
+ h(Z)F(x + a Z) (26)
+
Z
x+a
x+aZ
h(x + a s)(s)ds, (27)
(x,a) K, where F is the common distribution func-
tion of ξ.
Since density is a continuous function a.e. (see
Assumption 3.5), F is also continuous (see (Ash and
Dol
´
eans-Dade, 2000), p. 175)
Given the above, it suffices to prove that
Z
x+a
x+aZ
h(x + a s)(s)ds (28)
is a continuous function on (x,a) K.
For this purpose, let {(x
k
,a
k
)} be a sequence in
K converging to (x,a) K. By the Variable Change
Theorem ((Ash and Dol
´
eans-Dade, 2000) p. 52),
Z
x+a
x+aZ
h(x + a s)(s)ds =
Z
Z
0
h(y)(x + a y)dy.
(29)
Consider the following functions defined by
h
k
(y) := h(y)(x
k
+ a
k
y)I
[0,Z]
(y), (30)
g
k
(y) := γ∆(x
k
+ a
k
y)I
[0,Z]
(y), (31)
for k = 1,2,··· , y [0, +), where I
B
(·) denotes the
indicator function on the set B.
Note that |h
k
|g
k
for all k 1. Furthermore, {g
k
}
converges a.e. to the function g which is defined by
g(y) := γ∆(x + a y)I
[0,Z]
(y), (32)
y [0,+).
Furthermore,
Z
g
k
(y)dy = γ
Z
Z
0
(x
k
+ a
k
y)dy,
= γPr[x
k
+ a
k
Z ξ x
k
+ a
k
],
= γ(F(x
k
+ a
k
) F(x
k
+ a
k
Z)),
and, as the distribution F is continuous, then
lim
k
Z
g
k
(y)dy =
Z
g(y)dy. (33)
Finally, by the Dominated Convergence Theorem
((Royden, 1988) p. 92)
lim
k
Z
x
k
+a
k
x
k
+a
k
Z
h(x
k
+ a
k
s)(s)ds
= lim
k
Z
h
k
(y)dy
=
Z
lim
k
h
k
(y)dy
=
Z
Z
0
h(y)(x + a y)dy
=
Z
x+a
x+aZ
h(x + a s)(s)ds
and therefore the result holds.
By Lemma 4.1, Assumption 2.4 holds, and there-
fore Lemma 2.7 guarantees the existence of the opti-
mal policy, f
F, which, in the context of the reserve
process, describes the sequence of optimum premi-
ums that minimizes the performance index given in
(22).
Lemma 4.2. a) The transition law Q, induced by
(18), is stochastically ordered, i.e.,
Q(·|x,a)
st
Q(·|w,b) (34)
for each (x,a), (w,b) K with x w and a b.
b) The optimal value function V
(·), and the value
iteration functions V
n
(·), defined in (9), are de-
creasing on X .
Proof. a) Let (x, a), (w, b) K with x w and a b.
Observe that
[x + a s]
+
[w + b s]
+
, (35)
s [0,+).
On the other hand, if min{[w + b s]
+
,Z} = Z,
then min{[x+a s]
+
,Z} min{[w + b s]
+
,Z},
and if min{[w + b s]
+
,Z} = [w + b s]
+
, by
(35) min{[x + a s]
+
,Z} min{[w + b s]
+
,Z}.
Therefore
min{[x + a s]
+
,Z} min{[w + b s]
+
,Z},
(36)
s [0,+). Thus, by (36) if min{[w + b
ξ]
+
,Z} ς, then min{[x + a ξ]
+
,Z} ς, and
therefore
Q(min{[w + b ξ]
+
,Z} ς|w,b)
Q(min{[x + a ξ]
+
,Z} ς|x,a). (37)
Finally, by Remark 2.2, the result holds.
b) First it will be shown that V
n
is decreasing on X.
The proof is made by mathematical induction.
Let x,w X with x w. By definition of V
n
, for
n = 1,
V
1
(x) = inf
aA(x)
{[Z x]
+
}; (38)
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time
145
this implies that V
1
(x) = [Z x]
+
, therefore V
1
is
decreasing on X.
Now, for n = 2,
V
2
(x) = inf
aA(x)
{c(x,a)
+ α
Z
V
1
(min{[x + a s]
+
,Z})(s)ds}
= inf
aA(x)
{c(x,a)
+ α
Z
[Z min{[x + a s]
+
,Z}]
+
(s)ds}
= inf
aA(x)
{c(x,a)
+ α
Z
(Z min{[x + a s]
+
,Z})(s)ds}
= inf
aA(x)
{[Z x]
+
+ αZ
α
Z
min{[x + a s]
+
,Z}(s)ds}
= inf
aA(x)
{[Z x]
+
+ αZ
α
Z
yQ(dy|x, a)}.
Hence, by part (a) of this lemma and using
Lemma 2.3 with H
(y) = y, y X, the function
g
, defined by
g
(a) := α
Z
yQ(dy|x,a), (39)
a [K,M] is decreasing, and so its minimum is
M. This implies that
V
2
(x) = [Z x]
+
+ αZ α
Z
yQ(dy|x,a). (40)
Since x w and after some calculations, it is ob-
tained that V
2
(w) V
2
(x). As x and w are arbi-
trary, then V
2
is a decreasing function on X. Sup-
pose that V
n
is decreasing on X for some n > 2.
Again, take x,w X with x w. Then
V
n+1
(x) = inf
aA(x)
{c(x,a)
+ α
Z
V
n
(min{[x + a s]
+
,Z})(s)ds}
= inf
aA(x)
{[Z x]
+
+ α
Z
V
n
(y)Q(dy|x, a)}.
(41)
Let a [K,M]. By induction hypothesis and by
the stochastic order of Q, it yields that
[Z w]
+
+ α
Z
V
n
(y)Q(dy|w,a)
[Z x]
+
+ α
Z
V
n
(y)Q(dy|x, a),
then taking minimum on a [K,M] on both sides
of the inequality, it is obtained that V
n+1
(w)
V
n+1
(x). Therefore, V
n+1
is decreasing. By
Lemma 2.7c), V
n
(x) V
(x), x X, which im-
plies that V
is a decreasing function on X.
Theorem 4.3. The optimal policy for the reserve pro-
cess with dividends, induced by (18), is f
(·) M.
Proof. Let x X be fixed. By Lemma 2.7, V
satisfies
the optimality equation (7), that is,
V
(x) = inf
aA(x)
{[Z x]
+
+ α
Z
V
(y)Q(dy|x, a)}.
Also, by Lemma 4.2, V
is decreasing and Q is
stochastically ordered. Then, if a, b [K, M], with
a b, it is obtained that
α
Z
V
(y)Q(dy|x, b)
α
Z
V
(y)Q(dy|x, a). (42)
Adding [Z x]
+
on both sides of the inequality above,
it is concluded that, for a [K,M],
H(a) := [Z x]
+
+ α
Z
V
(y)Q(dy|x, a) (43)
is a decreasing function and its minimum is reached
in M. Since x is arbitrary, the result follows.
Finally, in this section, by Theorem 4.3 it is ob-
tained that the optimal value function is of the form
V
(x) = v(M,x) = E
M
x
"
+
n=0
α
n
[Z x
n
]
+
#
, (44)
for each x X. That is, the expected total discounted
cost of the penalties for not reaching the barrier Z, and
therefore not paying the dividends to shareholders is
brought to present value, given the discount factor α.
ICORES 2017 - 6th International Conference on Operations Research and Enterprise Systems
146
5 RATES FOR RUIN
PROBABILITY
This section presents a rate for ruin probability which
permits to determine a period of sustainability for
the company under the optimum reserve process, that
is, the process under the optimal policy (premium)
f
(·) M,
x
M
n+1
= min{[x
M
n
+ M ξ
n
]
+
,Z}, (45)
with x
M
0
= u > 0.
To this end,
ψ
N
d
(u) := Pr[x
M
0
= u, x
M
1
6= 0, ··· ,x
M
N1
6= 0, x
M
N
= 0]
(46)
is defined for u > 0 and N > 2.
Observe that ψ
N
d
(u) is the ruin probability when
τ
d
(u) = N, where τ
d
is the stopping time for the state
zero (see equation (16)).
Theorem 5.1. Let {x
M
n
} be the optimal reserve pro-
cess generated for the optimal policy f
M, with
x
M
0
= u > 0 and N > 2. Then
ψ
N
d
(u) (Pr[ξ < Z + M])
N2
·Pr[ξ < u +M]. (47)
Proof. The optimal process {x
M
n
} is a homogeneous
Markov process with transition law Q (see Remark
2.5).
Consider the following sets: B
0
= {x
M
0
= u}, B
N
=
{x
M
N
= 0} and B
i
= {x
M
i
6= 0}, for i = 1,2, ··· ,N 1,
and observe that B
i
B (X) for i = 1,2,··· ,N.
Then, by Proposition 7.3 p. 130 in (Breiman,
1992),
ψ
N
d
(u) =
= Pr[x
M
0
= u, x
M
1
6= 0, ··· ,x
M
N1
6= 0, x
M
N
= 0]
=
Z
B
N1
···
Z
B
0
Q(B
N
|w
N1
,M)
Q(dw
N1
|w
N2
,M) ···
Q(dw
1
|w
0
,M)ρ(dw
0
),
where the initial distribution ρ is the Dirac measure
concentred on u.
On the other hand, observe that
Q(B
N
|w
N1
,M) 1. (48)
Therefore
ψ
N
d
(u)
=
Z
B
N1
···
Z
B
0
Q(dw
N1
|w
N2
,M)
··· Q(dw
1
|w
0
,M)ρ(dw
0
).
furthermore, for each i = 1,2, ··· ,N 1, B
i
{ξ
i1
<
x
M
i1
+ M} {ξ < Z + M}; this implies that
Q(B
i
|w
i1
,M) Pr[ξ
i1
< x
M
i1
+M] Pr[ξ < Z +M].
(49)
So
ψ
N
d
(u)
=
Z
B
N2
···
Z
B
0
Pr[ξ < Z + M]
Q(dw
N2
|w
N3
,M) ···
Q(dw
1
|w
0
,M)ρ(dw
0
).
Finally, iterating this way N 3 times and since ρ
is concentrated in B
0
, it is obtained that
ψ
N
d
(u) (Pr[ξ < Z + M])
N2
Q(B
1
|u,M), (50)
where Q(B
1
|u,M) = Q(x
M
1
6= 0|u, M) = Pr[ξ < u +
M].
The examples that follow illustrate the applica-
tion of Theorem 5.1. To do this, the ruin probability
ψ
N
d
(u) = 0.001 and ν := 1 ψ
N
d
(u) are considered.
Table 1: Gamma distribution.
u κ = 1 years( N) κ = 3 years( N)
1 Z=4.503 19.07 Z=6.928 18.70
2 M=2 19.11 M=4.732 18.99
3 19.12 19.08
4 19.17 19.09
5.1 Example 1
Suppose that ξ has a Gamma distribution with param-
eters (λ,κ) whose density is of the form
(s) =
λ
Γ(κ)
(
s
λ
)
κ1
e
(s/λ)
,s > 0, (51)
where Γ(k) =
R
+
0
s
k1
e
s
ds is the Gamma function.
It is known that the Gamma distribution is not an-
alytically integrable, so it is necessary to resort to ta-
bles for this distribution given in (Wilks, 2011) Ap-
pendix B Table B.2.
In this case, the optimal premium is
M = κ + β
κ, (52)
where β is the loading factor.
Given λ = β = 1, and different values of u, Z, M,
and their respective period of sustainability (in years)
are calculated for κ = 1,3. These values are shown in
Table (1).
5.2 Example 2
Suppose that ξ has a Weibull distribution with param-
eters (λ,κ). It is known that the distribution function
is as follows:
F(s) = 1 e
(s/λ)
κ
,s > 0. (53)
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time
147
Since F(M + Z) = ν, it follows that
Z = λ(ln(1 ν)
1
)
1/κ
M. (54)
In this case, the optimal premium is
M = λ(Γ(1 + 1/κ) + β
q
Γ(1 + 2/κ) Γ
2
(1 + 1/κ)),
(55)
where β is the loading factor.
Given λ = β = 1, and different values of u, Z, M,
and their respective period of sustainability are calcu-
lated for κ = 0.8, 0.6. These values are shown in Table
(2).
Table 2: Weibull distribution.
u κ = 0.8 years(N) κ = 0.6 years( N)
1 Z=8.64 19.00 Z=20.91 18.98
2 M=2.56 19.08 M=4.14 19.03
3 19.12 19.07
4 19.15 19.10
6 CONCLUSIONS
With the theory presented in this paper, a discrete time
reserve process with a fixed barrier was determined,
when it was modelled as a discounted Markov De-
cision Process. The dynamics presented in Equation
(18) describes the behavior of the reserves of the com-
pany when these are below the barrier. This allows us
to set a penalty to take into account non-payments of
dividends. By controlling the process generated by
premiums, it is found that the optimal policy is M.
On the other hand, the rate presented in Theorem
5.1 permits to determine the periods of sustainability
of the company given a ruin probability and an initial
reserve. This bound depends on the distribution of
the total amount of claims per time interval. It should
also be noted that these random variables are only as-
sumed to have continuous density almost everywhere,
with finite first and second moments. This condition
is satisfied by a wide range of distributions. The ex-
amples illustrate how to apply the rate in the case of
distribution with light or heavy tails.
ACKNOWLEDGEMENTS
R. Montes-de-Oca, P. Saavedra, and D. Cruz-Su
´
arez
dedicate this article to the memory of their co-worker
and co-author of the present work, Gabriel Zacar
´
ıas-
Espinoza, whose sensible death occured on Novem-
ber, 10, 2015.
This work was partially supported by CONACYT
(M
´
exico) and ASCR (Czech Republic) under Grant
No. 171396.
REFERENCES
Ash, R. B. and Dol
´
eans-Dade, C. (2000). Probability and
Measure Theory. Elsevier, London, 2nd edition.
Asmussen, S. (2010). Ruin Probability. World Scientific,
Singapore, 2nd edition.
Azcue, P. and Muler, N. (2014). Stochastic Optimiza-
tion in Insurance a Dynamic Programming Approach.
Springer, London.
Breiman, L. (1992). Probability. SIAM, Berkeley.
Bulinskaya, Y. G. and Muromskaya, A. (2014). Discrete-
time insurance model with capital injections and rein-
surance. Methodol. Comput. Appl. Probab.
B
¨
auerle, N. and Rieder, U. (2011). Markov Decision Pro-
cesses with Applications to Finance. Springer, Berlin.
Cram
´
er, H. (1930). On the Mathematical Theory of Risk.
Skandia Jubillee Volume, Stockholm.
Cruz-Su
´
arez, D., de Oca, R. M., and Salem-Silva, F. (2004).
Conditions for the uniqueness of optimal policies of
discounted markov decision processes. Math. Meth-
ods Oper. Res., 60:415–436.
De-Finetti, B. (1957). Su un’impostaziones alternativa della
teoria collectiva del rischio. Trans. XV. Int. Congr.
Act., 2:433–443.
Diasparra, M. A. and Romera, R. (2009). Bounds for the
ruin probability of a discrete-time risk process. J.
Appl. Probab., 46:99–112.
Dickson, D. C. M. (2005). Insurance Risk and Ruin. Cam-
bridge University Press, Cambridge.
Dickson, D. C. M. and Waters, H. R. (2004). Some optimal
dividend problems. ASTIN Bull., 34:49–74.
Finch, P. D. (1960). Deterministic costumer impatience in
the queueing system gi/m/1. Biometrika, 47:45–52.
Gerber, H. U. (1981). On the probability of ruin in the pres-
ence of a linear dividend barrier. Scand. Acutarial J.,
pages 105–115.
Gerber, H. U., Shiu, E. S. W., and Smith, N. (2006). Max-
imizing dividends without bankruptcy. ASTIN Bull.,
36:5–23.
Ghosal, A. (1970). Some Aspects od Queueing and Storage
System. Springer Verlag, New York.
Hern
´
andez-Lerma, O. and Lasserre, J. B. (1996). Discrete-
time Markov Control Processes: Basic Optimality
Criteria. Springer Verlag, New York.
Li, S., Lu, Y., and Garrido, J. A. (2009). A review of
discrete-time risk models. Rev. R. Acad. Cien. Serie
A. Mat., 103(2):321–337.
Lindvall, T. (1992). Lectures on the Coupling Method. Wi-
ley, New York.
Lundberg, F. (1909).
¨
Uber die theorie der ruckversicherung.
Transactions of the VIth International Congress of Ac-
tuaries, 1:877–948.
ICORES 2017 - 6th International Conference on Operations Research and Enterprise Systems
148
Martin-L
¨
of, A. (1994). Lectures on the use of control theory
in insurance. Scand. Actuarial J., pages 1–25.
Mart
´
ınez-Morales, M. (1991). Adaptive Premium in an In-
surance Risk Process, Doctoral Thesis. Texas Tech
University, Texas.
Rolski, T., Schmidli, H., Schmidt, V., and Teugels, J. L.
(1999). Stochastic Processes for Insurance and Fi-
nance. Wiley, Chichester.
Royden, H. L. (1988). Real Analysis. Macmillan, New
York.
Schmidli, H. (2009). Stochastic Control in Insurance.
Springer, London.
Sch
¨
al, M. (2004). On discrete-time dynamic programming
in insurance: Exponential utility and minimizing the
ruin probability. Scand. Actuarial J., pages 189–210.
Wilks, D. S. (2011). Statistical Methods in the Atmospheric
Sciences. Academic Press, Burlington.
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time
149