EVOLUTIONARY COMPUTATION FOR DISCRETE AND
CONTINUOUS TIME OPTIMAL CONTROL PROBLEMS
Yechiel Crispin
Department of Aerospace Engineering
Embry-Riddle University
Daytona Beach, FL 32114
Keywords:
Optimal Control, Rocket Dynamics, Goddard’s Problem, Evolutionary Computation, Genetic Algorithms.
Abstract:
Nonlinear discrete time and continuous time optimal control problems with terminal constraints are solved
using a new evolutionary approach which seeks the control history directly by evolutionary computation.
Unlike methods that use the ﬁrst order necessary conditions to determine the optimum, the main advantage of
the present method is that it does not require the development of a Hamiltonian formulation and consequently,
it eliminates the requirement to solve the adjoint problem which usually leads to a difﬁcult two-point boundary
value problem. The method is veriﬁed on two benchmark problems. The ﬁrst problem is the discrete time
velocity direction programming problem with the effects of gravity, thrust and drag and a terminal constraint
on the ﬁnal vertical position. The second problem is a continuous time optimal control problem in rocket
dynamics, the Goddard’s problem. The solutions of both problems compared favorably with published results
1 INTRODUCTION
An optimal control problem consists of ﬁnding the
time histories of the controls and the state variables
such as to maximize an integral performance index
over a ﬁnite period of time, subject to dynamical con-
straints in the form of a system of ordinary differential
equations (Bryson, 1975). In a discrete-time optimal
control problem, the time period is divided into a ﬁ-
nite number of time intervals of equal duration T .
The controls are kept constant over each time inter-
val. This results in a considerable simpliﬁcation of the
continuous time problem, since the ordinary differen-
tial equations can be reduced to difference equations
and the integral performance index can be reduced to
a ﬁnite sum over the discrete time counter (Bryson,
1999). In some problems, additional constraints may
be prescribed on the ﬁnal states of the system.
Modern methods for solving the optimal control
problem are extensions of the classical methods of the
calculus of variations (Fox, 1950). These methods are
known as indirect methods and are based on the max-
imum principle of Pontryagin, which is a statement of
the ﬁrst order necessary conditions for optimality, and
results in a two-point boundary value problem (TP-
BVP) for the state and adjoint variables (L.S. Pon-
tryagin and Mishchenko, 1962).
It has been known, however, that the TPBVP is
much more difﬁcult to solve than the initial value
problem (IVP). As a consequence, a second class of
solutions, known as the direct method has evolved.
For example, attempts have been made to recast the
original dynamic optimization problem as a static
optimization problem by direct transcription (Betts,
2001) or some other discretisation method, eventually
reformulating the original problem as a nonlinear pro-
gramming (NLP) problem. This is often achieved by
parameterisation of the state variables or the controls,
or both. The original differential equations or differ-
ence equations are reduced to algebraic equality con-
straints. A signiﬁcant advantage of this method is that
the Hamiltonian formulation is completely avoided,
which can be advantageous to practicing engineers
who have not been exposed to the theoretical frame-
work of optimal control. However, there are some
problems with this approach. First, it might result in a
large scale NLP problem which might suffer from nu-
merical stability and convergence problems and might
require excessive computing time. Also, the para-
meterisation might introduce spurious local minima
which are not present in the original problem.
With the advent of computing power and the
45
Crispin Y. (2005).
EVOLUTIONARY COMPUTATION FOR DISCRETE AND CONTINUOUS TIME OPTIMAL CONTROL PROBLEMS.
In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics, pages 45-54
DOI: 10.5220/0001171200450054
c
SciTePress
progress made in methods that are based on op-
timization analogies from nature, it became pos-
sible to achieve a remedy to some of the above
mentioned disadvantages through the use of global
methods of optimization. These include stochas-
tic methods, such as simulated annealing (Laarhoven
and Aarts, 1989), (Kirkpatrick and Vecchi, 1983)
and evolutionary computation methods (Fogel, 1998),
(Schwefel, 1995) such as genetic algorithms (GAs)
Krawczyk, 1992) for an interesting treatment of the
linear discrete-time problem.
Genetic algorithms provide a powerful mechanism
towards a global search for the optimum, but in many
cases, the convergence is very slow. However, as will
be shown in this paper, if the GA is supplemented
by problem speciﬁc heuristics, the convergence can
be accelerated signiﬁcantly. It is well known that
GAs are based on a guided random search through
the genetic operators and evolution by artiﬁcial se-
lection. This process is inherently very slow, be-
cause the search space is very large and evolution pro-
gresses step by step, exploring many regions with so-
lutions of low ﬁtness. However, it is often possible to
guide the search further, by incorporating qualitative
knowledge about potential good solutions. In many
problems, this might involve simple heuristics, which
when combined with the genetic search, provide a
powerful tool for ﬁnding the optimum very quickly.
The purpose of the present work is to incorporate
problem speciﬁc heuristic arguments, which when
combined with a modiﬁed hybrid GA, can solve
the discrete-time optimal control problem very eas-
ily. There are signiﬁcant advantages to this approach.
First, the need to solve a difﬁcult two-point boundary
value problem (TPBVP) is completely avoided. In-
stead, only initial value problems (IVP) need to be
solved. Second, after ﬁnding an optimal solution,
we verify that it approximately satisﬁes the ﬁrst-order
necessary conditions for a stationary solution, so the
mathematical soundness of the traditional necessary
conditions is retained. Furthermore, after obtaining
a solution by direct genetic search, the static and dy-
namic Lagrange multipliers, i.e., the adjoint variables,
can be computed and compared with the results from
a gradient method. All this is achieved without di-
rectly solving the TPBVP. There is a price to be paid,
however, since, in the process, we are solving many
initial value problems (IVPs). This might present a
challenge in more advanced and difﬁcult problems,
where the dynamics are described by higher order
systems of ordinary differential equations, or when
the equations are difﬁcult to integrate over the re-
quired time interval and special methods of numer-
ical integration are required. On the other hand, if
the system is described by discrete-time difference
equations that are relatively well behaved and easy
to iterate, the need to solve the initial value problem
many times does not represent a serious problem. For
instance, the example problem presented here , the
discrete velocity programming problem (DVDP) with
the combined effects of gravity, thrust and drag, to-
gether with a terminal constraint (Bryson, 1999), runs
on a 1.6 GHz pentium 4 processor in less than one
minute CPU time.
In the next section, a mathematical formulation of
the discrete time optimal control problem is given.
This formulation is used to study a speciﬁc example
of a discrete time problem, namely the velocity di-
rection programming of a body moving in a viscous
ﬂuid. Details of this problem are given in Section 3.
The evolutionary computation approach to the solu-
tion is then described in Section 4 where results are
presented and compared with the results of an indi-
rect gradient method developed by Bryson (Bryson,
1999). In Section 5, a mathematical formulation of
the continuous time optimal control problem for non-
linear dynamical systems is presented. A speciﬁc il-
lustrative example of a continuous time optimal con-
trol problem is described in Section 6, where we study
the Goddard’s problem of rocket dynamics using the
proposed evolutionary computation method. Finally
conclusions are summarized in Section 7.
2 OPTIMAL CONTROL OF
DISCRETE TIME NONLINEAR
SYSTEMS
In this section, a formulation is developed for the
nonlinear discrete-time optimal control problem sub-
ject to terminal constraints. Consider the nonlinear
discrete-time dynamical system described by differ-
ence equations with initial conditions
x(i + 1) = f[x(i), u(i), i] (2.1)
x(0) = x
0
(2.2)
where x R
n
is the vector of state variables, u
R
p
, p < n is the vector of control variables and i
[0, N 1] is a discrete time counter. The function f
is a nonlinear function of the state vector, the control
vector and the discrete time i, i.e., f : R
n
x R
p
x
R 7→ R
n
. Next, deﬁne a performance index
J[x(i), u(i), i, N] = φ[x(N)]+Σ
M
i=0
L[x(i), u(i), i]
(2.3)
where M = N 1, φ : R
n
7→ R, L : R
n
x R
p
x
R 7→ R
Here L is the Lagrangian function and φ[x(N)] is
a function of the terminal value of the state vector
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
46
x(N). In some problems, additional terminal con-
straints can be prescribed through the use of functions
ψ of the state variables x(N)
ψ[x(N)] = 0, ψ : R
n
7→ R
k
k n (2.4)
The optimal control problem consists of ﬁnding the
control sequence u(i) such as to maximize (or mini-
mize) the performance index deﬁned by (2.3), subject
to the dynamical equations (2.1) with initial condi-
tions (2.2) and terminal constraints (2.4). This formu-
lation is known as the Bolza problem in the calculus
of variations. In an alternative formulation, due to
Mayer, the state vector x
j
, j [1, n] is augmented by
n+1
which satisﬁes the
initial value problem:
x
n+1
(i + 1) = x
n+1
(i) + L[x(i), u(i), i] (2.5)
x
n+1
(0) = 0 (2.6)
The performance index can then be written in the
following form
J(N) = φ[x(N)] + x
n+1
(N) φ
a
[x
a
(N)] (2.7)
where x
a
= [x x
n+1
]
T
is the augmented state
vector and φ
a
the augmented performance index. In
this paper, the Meyer formulation is used.
We next deﬁne an augmented performance index
straints f [x(i), u(i), i]x(i+1) = 0, with static and
dynamical Lagrange multipliers ν and λ, respectively,
in the following form:
J
a
= φ + ν
T
ψ + λ
T
(0)[x
0
x(0)]+
M
i=0
λ
T
(i + 1){f[x(i), u(i), i] x(i + 1)} (2.8)
Deﬁne a Hamiltonian function as
H(i) = λ
T
(i + 1)f [x(i), u(i), i] (2.9)
Rewriting the augmented performance index in
terms of the Hamiltonian function, we get
J
a
= φ + ν
T
ψ λ
T
(N)x(N ) + λ
T
(0)x
0
+
M
i=0
[H(i) λ
T
(i)x(i)] (2.10)
A ﬁrst order necessary condition for J
a
to reach a
stationary solution is given by the discrete version of
the Euler-Lagrange equations
λ
T
(i) = H
x
(i) = λ
T
(i+1)f
x
[x(i), u(i), i] (2.11)
with ﬁnal conditions
λ
T
(N) = φ
x
+ ν
T
ψ
x
(2.12)
The control u(i) satisﬁes the optimality condition:
H
u
(i) = λ
T
(i + 1)f
u
[x(i), u(i), i] = 0 (2.13)
If we deﬁne an augmented function Φ as
Φ = φ + ν
T
ψ (2.14)
then the ﬁnal conditions can be written in terms
of the augmented function Φ in a similar way to the
problem without terminal constraints
λ
T
(N) = Φ
x
= φ
x
+ ν
T
ψ
x
(2.15)
The indirect approach to optimal control uses the
necessary conditions for an optimum to obtain a solu-
tion. In this approach, the state equations (2.1) with
initial conditions (2.2) need to be solved together with
the adjoint equations (2.11) and the ﬁnal conditions
(2.15), where the control sequence u(i) is to be de-
termined from the optimality condition (2.13). This
represents a coupled system of nonlinear difference
equations with part of the boundary conditions speci-
ﬁed at the initial time i = 0 and the rest of the bound-
ary conditions speciﬁed at the ﬁnal time i = N. This
is a nonlinear two-point boundary value problem (TP-
BVP) in difference equations. Except for some spe-
cial simpliﬁed cases, it is usually very difﬁcult to ob-
tain solutions for such nonlinear TPBVPs in closed
form. Therefore, many numerical methods have been
developed to tackle this problem.
Several gradient based methods have been pro-
posed for solving the discrete-time optimal con-
trol problem (Mayne, 1966). For example, Mur-
ray and Yakowitz (Murray and Yakowitz, 1984) and
(Yakowitz and Rutherford, 1984) developed a differ-
ential dynamic programming and Newton’s method
for the solution of discrete optimal control problems,
and Mayne, 1970), (Ohno, 1978), (Pantoja, 1988) and
(Dunn and Bertsekas, 1989). Similar methods have
been further developed by Liao and Shoemaker (Liao
and Shoemaker, 1991). Another method, the trust
region method, was proposed by Coleman and Liao
(Coleman and Liao, 1995) for the solution of uncon-
strained discrete-time optimal control problems. Al-
though conﬁned to the unconstrained problem, this
method works for large scale minimization problems.
In contrast to the indirect approach, in the present
proposed approach, the optimality condition (2.13)
and the adjoint equations (2.11) together with their
ﬁnal conditions (2.15) are not used in order to obtain
EVOLUTIONARY COMPUTATION FOR DISCRETE AND CONTINUOUS TIME OPTIMAL CONTROL PROBLEMS
47
the optimal solution. Instead, the optimal values of
the control sequence u(i) are found by a modiﬁed ge-
netic search method starting with an initial population
of solutions with values of u(i) randomly distributed
within a given domain of admissible controls. Dur-
ing the search, approximate, not necessarily optimal
values of the solutions u(i) are found for each gen-
eration. With these approximate values known, the
state equations (2.1) together with their initial condi-
tions (2.2) are very easy to solve as an initial value
problem, by a straightforward iteration of the differ-
ence equations from i = 0 to i = N 1. At the
end of this iterative process, the ﬁnal values x(N)
are obtained, and the ﬁtness function can be deter-
mined. The search than seeks to maximize the ﬁtness
function F such as to fulﬁll the goal of the evolution,
which is to maximize J(N), as given by the follow-
ing Eq.(2.16), subject to the terminal constraints as
deﬁned by Eq.(2.17).
maximize J(N) = φ[x(N )] (2.16)
subject to the dynamical equality constraints, Eqs.
(2.1-2.2) and to the terminal constraints (2.4), which
are repeated here for convenience as Eq.(2.17)
ψ[x(N)] = 0
ψ : R
n
7→ R
k
k n (2.17)
Since we are using a direct search method, condi-
tion (2.17) can also be stated as a search for a maxi-
mum, namely we can set a goal which is equivalent to
(2.17) in the form
maximize J
1
(N) = ψ
T
[x(N)]ψ[x(N)] (2.18)
The ﬁtness function F can now be deﬁned by
F (N) = αJ(N ) + (1 α)J
1
(N) =
= αφ[x(N)] (1 α)ψ
T
[x(N)]ψ[x(N)] (2.19)
with α [0, 1] and x(N ) determined from a solu-
tion of the original initial value problem for the state
variables:
x(i + 1) = f[x(i), u(i), i], i [0, N 1] (2.20)
x(0) = x
0
(2.21)
3 VELOCITY DIRECTION
CONTROL OF A BODY IN A
VISCOUS FLUID
In this section, we treat the case of controlling the mo-
tion of a particle moving in a viscous ﬂuid medium
by varying the direction of a thrust vector of constant
magnitude. We describe the motion in a cartesian sys-
tem of coordinates in which x is pointing to the right
and y is positive downward. The constant thrust force
F is acting along the path, i.e. in the direction of the
velocity vector V with magnitude F = amg. The ac-
celeration of gravity g is acting downward in the pos-
itive y direction. The drag force is proportional to the
square of the speed and acts in a direction opposite
to the velocity vector V . The motion is controlled
by varying the angle γ, which is positive downward
from the horizontal. The velocity direction γ is to be
programmed such as to achieve maximum range and
fulﬁll a prescribed terminal constraint on the vertical
ﬁnal location y
f
. Newton’s second law of motion for
a particle of mass m can be written as
mdV/dt = mg(a + sinγ)
1
2
ρV
2
C
D
S (3.1)
where ρ is the ﬂuid density, C
D
is the coefﬁcient
of drag and S is a typical cross-section area of the
body. For example, if the motion of the center of
gravity of a spherical submarine vehicle is considered,
then S is the maximum cross-section area of the ve-
hicle and C
D
would depend on the Reynolds number
Re = ρV d/µ , where µ is the ﬂuid viscosity and d the
diameter of the vehicle. Dividing (3.1) by the mass m,
we obtain
dV/dt = g(a + sinγ) V
2
/L
c
(3.2)
The length L
c
= 2m/(ρSC
D
) is a typical hydro-
dynamic length. The other equations of motion are:
dx/dt = V cosγ (3.3)
dy/dt = V sinγ (3.4)
with initial conditions and ﬁnal constraint
V (0) = 0, x(0) = 0, y(0) = 0 (3.5)
y(t
f
) = y
f
(3.6)
In order to rewrite the equations in nondimen-
sional form, we introduce the following nondimen-
sional variables, denoted by primes:
t = (L
c
/g)
1/2
t, V = (gL
c
)
1/2
V
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
48
x = L
c
x, y = L
c
y (3.7)
where we have chosen L
c
as the characteristic
length. Substituting the nondimensional variables
(3.7) in the equations of motion (3.2-3.4) and omit-
ting the prime notation, we obtain the nondimensional
state equations
dV/dt = a + sinγ V
2
(3.8)
dx/dt = V cosγ (3.9)
dy/dt = V sinγ (3.10)
In order to formulate a discrete time version of this
problem, we ﬁrst rewrite (3.8) in separated variables
form as dV/(a + sinγ V
2
) = dt. Integrating and
using the conditionV (0) = 0, we get
(1/b)argtanh(V/b) = t (3.11)
Solving for the speed, we obtain
V = btanh(bt) (3.12)
b = (a + sinγ)
1/2
(3.13)
We now develop a discrete time model by dividing
the trajectory into a ﬁnite number N of straight line
segments of ﬁxed duration T = t
f
/N along which
the control γ is kept constant. The time at the end of
each segment is given by t(i) = iT , with i a time
step counter at point i. The time is normalized by
(L
c
/g)
1/2
, so the nondimensional ﬁnal time is t
f
=
t
f
/(L
c
/g)
1/2
and the nondimensional time at step i
is t(i) = it
f
/N. The nondimensional time interval
is (∆T ) = t
f
/N. Writing (3.12) at t(i + 1) = (i +
1)∆T , we obtain the velocity at the point (i+1) along
the trajectory
V (i + 1) = b(i)tanh[b(i)(i + 1)t
f
/N] (3.14)
b(i) = (a + sinγ(i))
1/2
(3.15)
Similarly, substituting the time t(i) = it
f
/N in
(3.11), the following expression is obtained, which we
deﬁne as the function G
0
(i).
ib(i)t
f
/N = argtanh[V (i)/b(i)] G
0
(i) (3.16)
Introducing a second function G
1
(i) deﬁned by
G
1
(i) = G
0
(i) + b(i)t
f
/N (3.17)
Eq.(3.14) can be written as
V (i + 1) = b(i)tanh[G
1
(i)] (3.18)
We now determine the coordinates x and y as a
function of time. Using the state equation (3.9) to-
gether with the result (3.12) and deﬁning θ = bt, we
obtain
dx = V cosγdt = bcosγtanh(bt)dt =
= cosγtanhθ (3.19)
Integrating along a straight line segment between
points i and i + 1, we get
x(i + 1) = x(i)+
+cosγ(i)[logcoshθ(i + 1) logcoshθ(i)] (3.20)
θ(i) = ib(i)t
f
/N = G
0
(i) (3.21)
θ(i + 1) = b(i)(i + 1)t
f
/N = ib(i)t
f
/N+
+b(i)t
f
/N = G
0
(i) + b(i)t
f
/N = G
1
(i) (3.22)
Substituting (3.21-3.22) in (3.20), we obtain the
following discrete-time state equation (3.24) for the
location x(i + 1). The equation for the coordinate
y(i+1) can be developed in a similar way to x(i+1),
with cosγ(i) replaced by sinγ(i). Adding the state
equation (3.18) for the velocityV (i + 1), which is re-
peated here as Eq.(3.23), the state equations become:
V (i + 1) = b(i)tanh[G
1
(i)] (3.23)
x(i + 1) = x(i)+
+cosγ(i) log[coshG
1
(i)/coshG
0
(i)] (3.24)
y(i + 1) = y(i)+
+sinγ(i)log[coshG
1
(i)/coshG
0
(i)] (3.25)
with initial conditions and terminal constraint
V (0) = 0, x(0) = 0, y(0) = 0 (3.26)
y(N) = y
f
= y
f
/L
c
(3.27)
The optimal control problem now consists of ﬁnd-
ing the sequence γ(i) for i [0, N 1] such as to
maximize the range x(N ), subject to the state equa-
tions (3.23-3.25), the initial conditions (3.26) and the
terminal constraint (3.27), where y
f
is in units of L
c
and the ﬁnal time t
f
in units of (L
c
/g)
1/2
.
EVOLUTIONARY COMPUTATION FOR DISCRETE AND CONTINUOUS TIME OPTIMAL CONTROL PROBLEMS
49
4 EVOLUTIONARY APPROACH
TO OPTIMAL CONTROL
We now describe the proposed direct approach which
is based on a genetic search method. As was previ-
ously mentioned, an important advantage of this ap-
proach is that there is no need to solve the two-point
boundary value problem described by the state equa-
tions (2.1) and the adjoint equations (2.11), together
with the initial conditions (2.2), the ﬁnal conditions
(2.15), the terminal constraints (2.4) and the optimal-
ity condition (2.13) for the optimal control u(i).
method allows us to evolve a population of solutions
such as to maximize the objective function or ﬁt-
ness function F (N). The initial population is built
by generating a random population of solutions γ(i),
i [0, N 1], uniformly distributed within a domain
γ [γ
min
, γ
max
]. Typical values are γ
max
= π/2 and
either γ
min
= π/2 or γ
min
= 0 depending on the
problem. The genetic algorithm evolves this initial
population using the operations of selection, mutation
and crossover over many generations such as to max-
imize the ﬁtness function:
F (N) = αJ(N ) + (1 α)J
1
(N) =
= αφ[ξ(N)] (1 α)ψ
T
[ξ(N )]ψ[ξ(N)] (4.1)
with α [0, 1] and J(N ) and J
1
(N) given by:
J(N) = φ[ξ(N)] = x(N) (4.2)
J
1
(N) = ψ
2
[ξ(N )] = (y(N) y
f
)
2
(4.3)
For each member in the population of solutions,
the ﬁtness function depends on the ﬁnal values x(N)
and y(N), which are determined by solving the initial
value problem deﬁned by the state equations (3.23-
3.25) together with the initial conditions (3.26). This
process is repeated over many generations. Here, we
run the genetic algorithm for a predetermined num-
ber of generations and then we check if the terminal
constraint (3.27) is fulﬁlled. If the constraint is not
fulﬁlled, we can either increase the number of gener-
ations or readjust the weight α [0, 1].
We now present results obtained by solving this
problem using the proposed approach. We ﬁrst treat
the case where x(N)is maximized with no constraint
placed on y
f
. We solve an example where the value
of the thrust is a = 0.05 and the ﬁnal time is t
f
= 5.
The evolution of the solution over 50 generations
is shown in Fig.1. The control sequence γ(i), the op-
timal trajectory and the velocity V
2
(i) are displayed
in Fig.2. The sign of y is reversed for plotting. It
can be seen from the plots that the angle varies at the
beginning and at the end of the motion, but remains
constant in the middle of the maneuver, resulting in
a dive along a straight line along a considerable por-
tion of the trajectory. This ﬁnding agrees well with
the results obtained by Bryson (Bryson, 1999) using
0 5 10 15 20 25 30 35 40 45 50
−2.75
−2.7
−2.65
−2.6
−2.55
−2.5
Best f
0 5 10 15 20 25 30 35 40 45 50
−2.8
−2.7
−2.6
−2.5
−2.4
Generations
Average f
Figure 1: Convergence of the DVDP solution with gravity,
thrust a=0.05 and drag, with no terminal constraint on y
f
.
x(N) is maximized. With ﬁnal time t
f
= 5.
0 10 20 30 40 50 60
0
0.5
1
i
gama / ( pi/2 )
0 0.5 1 1.5 2 2.5 3
−2
−1.5
−1
−0.5
0
x / L
c
y / L
c
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
−2
−1.5
−1
−0.5
0
v
2
/ 2 g L
c
y / L
c
Figure 2: The control sequence, the optimal trajectory and
the velocity V
2
(i) for the DVDP problem with gravity,
thrust a=0.05 and drag. No terminal constraint on y
f
. Final
time t
f
= 5. The sign of y is reversed for plotting.
5 NONLINEAR CONTINUOUS
TIME OPTIMAL CONTROL
In this section, a formulation is developed for the non-
linear continuous time optimal control problem sub-
ject to terminal constraints. Consider the continuous
time nonlinear problem described by a system of or-
dinary differential equations with initial conditions
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
50
dx/dt = f [x(t), u(t), t] (5.1)
x(0) = x
0
(5.2)
where x R
n
is the vector of state variables,
u R
p
, p < n is the vector of control variables and
t [0, t
f
] is the continuous time. The function f is a
nonlinear function of the state vector, the control vec-
tor and the time t, i.e., f : R
n
x R
p
x R 7→ R
n
.
Next, deﬁne a performance index
J[x(t), u(t), t
f
] = φ[x(t
f
)] +
tf
0
L[x(t), u(t), t]dt
(5.3)
φ : R
n
7→ R, L : R
n
xR
p
xR 7→ R
Here L is the Lagrangian function and φ[x(t
f
)] is
a function of the terminal value of the state vector
x(t
f
). In some problems, additional terminal con-
straints can be prescribed through the use of functions
ψ of the state variables x(t
f
)
ψ[x(t
f
)] = 0, ψ : R
n
7→ R
k
, k n (5.4)
The formulation of the optimal control problem ac-
cording to Bolza consists of ﬁnding the control u(t)
such as to maximize the performance index deﬁned
by (5.3), subject to the state equations (5.1) with ini-
tial conditions (5.2) and terminal constraints (5.4). In
the alternative formulation, due to Mayer, the state
vector x
j
, j [1, n] is augmented by an additional
state variable x
n+1
which satisﬁes the following ini-
tial value problem:
dx
n+1
/dt = L[x(t), u(t), t] (5.5)
x
n+1
(0) = 0 (5.6)
The performance index can then be written as
J(t
f
) = φ[x(t
f
)] + x
n+1
(t
f
) φ
a
[x
a
(t
f
)] (5.7)
where x
a
= [x x
n+1
]
T
is the augmented state
vector and φ
a
the augmented performance index. We
next deﬁne an augmented performance index with ad-
joint constraints ψ and adjoint dynamical constraints
f [x(t), u(t), t ]dx/dt = 0, with static and dynam-
ical Lagrange multipliers ν and λ as:
J
a
(t
f
) = φ[x(t
f
)]+ν
T
ψ[x(t
f
)]+λ
T
(0)[x
0
x(0)]
+
tf
0
λ
T
(t)[f[x(t), u(t), t] dx/dt]dt (5.8)
Introducing a Hamiltonian function
H(t) = λ
T
(t)f[x(t), u(t), t] (5.9)
and rewriting the augmented performance index in
terms of the Hamiltonian, we get
J
a
(t
f
) = φ + ν
T
ψ λ
T
(t
f
)x(t
f
) + λ
T
(0)x
0
+
+
tf
0
[H(t) λ
T
(t)x(t)]dt (5.10)
A ﬁrst order necessary condition for J
a
to reach
a stationary solution is given by the Euler-Lagrange
equations
T
/dt = H
x
(t) = λ
T
(t)f
x
[x(t), u(t), t]
(5.11)
with ﬁnal conditions
λ
T
(t
f
) = φ
x
[x(t
f
)] + ν
T
ψ
x
[x(t
f
)] (5.12)
The control u(t) satisﬁes the optimality condition:
H
u
(t) = λ
T
(t)f
u
[x(t), u(t), t] = 0 (5.13)
If we deﬁne an augmented function Φ[x(t
f
)] as
Φ[x(t
f
)] = φ[x(t
f
)] + ν
T
ψ[x(t
f
)] (5.14)
then the ﬁnal conditions can be written in terms
of the augmented function Φ in a similar way to the
problem without terminal constraints
λ
T
(t
f
) = Φ
x
[x(t
f
)] = φ
x
[x(t
f
)] + ν
T
ψ
x
[x(t
f
)]
(5.15)
In the indirect approach to optimal control, the nec-
essary conditions are used to obtain an optimal solu-
tion: the state equations (5.1) with initial conditions
(5.2) have to be solved together with the adjoint equa-
tions (5.11) and the ﬁnal conditions (5.15). The con-
trol history u(t) is determined from the optimality
condition (5.13). Consequently, this approach leads
to a coupled system of nonlinear ordinary differential
equations with the boundary conditions for the state
variables speciﬁed at the initial time t = 0 and the
boundary conditions for the adjoint variables speci-
ﬁed at the ﬁnal time t = t
f
. This is a nonlinear two-
point boundary value problem (TPBVP) in ordinary
differential equations. Except for some special sim-
pliﬁed cases, it is usually very difﬁcult to obtain solu-
tions for such nonlinear TPBVPs analytically. Many
numerical methods have been developed in order to
obtain approximate solutions to this problem.
EVOLUTIONARY COMPUTATION FOR DISCRETE AND CONTINUOUS TIME OPTIMAL CONTROL PROBLEMS
51
6 GODDARD’S OPTIMAL
CONTROL PROBLEM IN
ROCKET DYNAMICS
We now illustrate the above approach with a contin-
uous time optimal control example. We apply the
optimal control formulation described in the previous
section for continuous time dynamical systems to the
study of the vertical climb of a single stage sound-
ing rocket launched vertically from the ground. This
is known in the literature as the Goddard’s problem.
The problem is to control the thrust of the rocket such
as to maximize the ﬁnal velocity or the ﬁnal altitude.
There are two versions to this problem: in the ﬁrst
version, the ﬁnal mass of the rocket is prescribed and
the ﬁnal time is free. In the second version, the ﬁ-
nal time is prescribed and the ﬁnal mass is free. The
second version of this problem will be presented.
Let h(t) denote the altitude of the rocket as mea-
sured from sea level and v(t) and m(t) the velocity
and the mass of the rocket, respectively. Here the
time t is continuous. The trajectory of the rocket is a
vertical straight line. The forces acting on the rocket
are the thrust T (t), which is used as the control vari-
able or control history, the aerodynamic drag force
D(h, v), which is a function of altitude and speed
and the weight of the rocket m(t)g, where m(t) is
the mass and g is the acceleration of gravity, assumed
constant. The equations of motion are:
dh/dt = v (6.1)
mdv/dt = T D mg (6.2)
dm/dt = T/c (6.3)
where the drag force is given by
D(h, v) = D
0
v
2
exp(h/h
r
) (6.4)
Hereh
r
= 23800 ft is a characteristic altitude and
D
0
is a characteristic drag force given by
D
0
= 0.711T
M
/c
2
(6.5)
where T
M
is the maximum thrust developed by
the rocket. The speed c is the propellant jet exhaust
speed. An important parameter in rocket dynamics is
the thrust to weight ratio τ = T
M
/(m
0
g), where m
0
is the initial mass of the vehicle and g = 32.2 ft/s
2
.
In this example, a ratio of 2 is chosen:
τ = T
M
/(m
0
g) = 2 (6.6)
Here we take a value of m
0
= 3 slugs for a small
experimental rocket. A typical value of the exhaust
speed c and the speciﬁc impulse I
sp
for an early
rocket such as the one tested by Goddard is given by
c = (3.264gh
r
)
1/2
= 1581ft/s
I
sp
= c/g = 49.14sec (6.7)
The initial conditions are
h(0) = 0, v(0) = 0, m(0) = m
0
= 3 slugs
(6.8)
The optimal control problem is to ﬁnd the control
history T (t) such as to maximize the ﬁnal altitude (or
the altitude at burnout) h(t
f
) in a given time t
f
, where
a value t
f
= 18 sec was used in this example. The
state equations (6.1-6.3) with the initial conditions
(6.8) are to be solved in the optimization process. Be-
fore solving this problem, we ﬁrst restate the problem
in non-dimensional form. Choosing the characteristic
speed (gh
r
)
1/2
= 875 ft/s and the characteristic time
(h
r
/g)
1/2
= 27.2 sec, we introduce nondimensional
variables, denoted here by primes:
h = h
r
h, v = (gh
r
)
1/2
v, t = (h
r
/g)
1/2
t
m = m
0
m, T = T
M
T , D = T
M
D (6.9)
Introducing the variables from Eq.(6.9) in the state
equations (6.1-6.3) and simplifying, the following
system of non-dimensional equations is obtained:
dh/dt = v (6.10)
mdv/dt = τ T τσ
2
v
2
exp(h) m (6.11)
dm/dt = 1.186 στ T (6.12)
In this system of equations (6.10-6.12) all the vari-
ables are non-dimensional and the prime notation has
been omitted. Two independent non-dimensional pa-
rameters characterizing this problem are obtained: the
thrust to weight ratio τ, introduced before and a ratio
of two speeds σ deﬁned by
τ = T
M
/(m
0
g) = 2, T
M
= 193 lbs
σ = (0.711gh
r
)
1/2
/c = 0.467 (6.13)
The non-dimensional initial conditions are:
h(0) = 0, v(0) = 0, m(0) = 1 (6.14)
The optimal control problem is to ﬁnd the control
history T (t) such as to maximize the ﬁnal altitude (the
altitude at burnout) h(t
f
) in a given normalized time
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
52
t
f
= t
f
/(h
r
/g)
1/2
= 0.662
subject to the state equations (6.10-6.12) and the
initial conditions (6.14). The results of this contin-
uous time optimal control problem are given in Fig-
ures (3-4). Figure 3 shows the control history of the
thrust as a function of the time in seconds. In the
genetic search, the search range for the normalized
thrust was between a lower bound of (T/T
max
)
L
= 0.1
and an upper bound (T/T
max
)
U
= 1. It can be seen that
the thrust increases sharply during the ﬁrst two sec-
onds of the ﬂight and remains closer to 1 afterwards.
Figure 4 displays the state variables, the altitude, the
velocity and the mass of the rocket as a function of
time. The mass of the rocket decreases almost linearly
during the ﬂight due to the optimal control which re-
quires an almost constant thrust. This is in agreement
with the results of Betts (J. Betts and Huffman, 1993)
and (A.S. Bondarenko and More, 1999) who used a
More, 2000).
0 2 4 6 8 10 12 14 16 18 20
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
T/Tmax
0 2 4 6 8 10 12 14 16 18 20
120
140
160
180
200
Thrust (lbs)
time (sec)
Figure 3: The thrust control history. The upper graph dis-
plays the value of the thrust normalized by maximum thrust.
The lower graph shows the actual thrust in lbs.
7 CONCLUSIONS
A new method for solving both discrete time and con-
tinuous time nonlinear optimal control problems with
terminal constraints has been presented. Unlike other
methods that use the ﬁrst-order necessary conditions
to ﬁnd the optimum, the present method seeks the best
control history directly by a modiﬁed genetic search.
As a consequence of this direct search approach, there
is no need to develop a Hamiltonian formulation and
therefore there is no need to solve a difﬁcult two-point
boundary value problem for the state and adjoint vari-
0 2 4 6 8 10 12 14 16 18 20
0
2000
4000
6000
8000
h (ft)
0 2 4 6 8 10 12 14 16 18 20
0
500
1000
1500
v (ft/sec)
0 2 4 6 8 10 12 14 16 18 20
0
1
2
3
time (sec)
m (slugs)
Figure 4: The state variables, the altitude, velocity and mass
of the rocket as a function of time
vanced and higher order problems where it is difﬁ-
cult to solve the two point boundary value problem
(TPBVP) with large systems of ordinary differential
equations. There is a computational price to be paid,
however, since the method involves repetitive solu-
tions of the initial value problem (IVP) for the state
variables during the evolutionary process.
The method was demonstrated by solving a
discrete-time optimal control problem, namely, the
discrete velocity direction programming problem
(DVDP) of a body with the effects of gravity, thrust
and hydrodynamic drag. Benchmark problems of this
kind were pioneered by Bryson who used analytical
and gradient methods. This discrete time problem was
solved easily using the proposed approach and the re-
sults compared favorably with the results of Bryson.
The method was also applied to a continuous time
nonlinear optimal control problem, the Goddard’s
problem of rocket dynamics. The results compared
favorably with published results obtained by a non-
linear programming method.
REFERENCES
A.S. Bondarenko, D. B. and More, J. (1999). Cops: Large
scale nonlinearly constrained optimization problems.
In Argonne National Laboratory Technical Report
ANL/MCS-TM-237.
Betts, J. (2001). Practical Methods for Optimal Control Us-
ing Nonlinear Programming. SIAM, Society for In-
dustrial and Applied Mathematics, Philadelphia, PA.
Bryson, A. (1999). Dynamic Optimization. Addison-
Wesley Longman, Menlo Park, CA.
Coleman, T. and Liao, A. (1995). An efﬁcient trust region
method for unconstrained discrete-time optimal con-
trol problems. In Computational Optimization and
Applications, 4:47-66.
EVOLUTIONARY COMPUTATION FOR DISCRETE AND CONTINUOUS TIME OPTIMAL CONTROL PROBLEMS
53
Dolan, E. and More, J. (2000). Benchmarking optimization
software with cops. In Argonne National Labo-ratory
Technical Report ANL/MCS-TM-246.
Dunn, J. and Bertsekas, D. (1989). Efﬁcient dynamic pro-
gramming implementations of newton’s method for
unconstrained optimal control problems. In J. of Op-
timization Theory and Applications,63, pp. 23-38.
Fogel, D. (1998). Evolutionary Computation, The Fossil
Record. IEEE Press, New York.
Fox, C. (1950). An Introduction to the Calculus of Varia-
tions. Oxford University Press, London.
J. Betts, S. E. and Huffman, W. (1993). Sparse nonlin-
ear programming test problems. In Technical Report
BCSTECH-93-047. Boeing Computer Services, Seat-
tle, Washington.
Jacobson, D. and Mayne, D. (1970). Differential Dynamic
Programming. Elsevier Science Publishers, Amster-
dam, Netherland.
Kirkpatrick, G. and Vecchi (1983). Optimization by simu-
lated annealing. In Science, 220: 671-680. AAAS.
Laarhoven, P. and Aarts, E. (1989). Simulated Annealing:
Theory and Applications. Kluwer, Amsterdam.
Liao, L. and Shoemaker, C. (1991). Convergence in uncon-
strained discrete-time differential dynamic program-
ming. In IEEE Transactions on Automatic Control,
36, pp. 692-706. IEEE.
L.S. Pontryagin, V.G. Boltyanskii, R. G. and Mishchenko,
E. (1962). The Mathematical Theory of Opti-
mal Processes. translated by K.N. Trirogoff, L.W.
Neustadt (Ed.), Moscow, interscience, new york edi-
tion.
Mayne, D. (1966). A second-order gradient method for
determining optimal trajectories of nonlinear discrete
time systems. In International Journal of Control, 3,
pp. 85-95.
Michalewicz, Z. (1992). Genetic Algorithms + Data Struc-
tures= Evolution Programs. Springer-Verlag, Berlin.
Murray, D. and Yakowitz, S. (1984). Differential dynamic
programming and newton’s method for discrete opti-
mal control problems. In J. of Optimization Theory
and Applications, 43:395-414.
Ohno, K. (1978). A new approach of differential dynamic
programming for discrete time systems. In IEEE
Transactions on Automatic Control, 23, pp. 37-47.
IEEE.
Pantoja, J. (1988). Differential dynamic programming and
newton’s method. In International Journal of Control,
53: 1539-1553.
Schwefel, H. (1995). Evolution and Optimum Seeking. Wi-
ley, New York.
Yakowitz, S. and Rutherford, B. (1984). Computational as-
pects of discrete-time optimal control. In Appl. Math.
Comput., 15, pp. 29-45.
Z. Michalewicz, C. J. and Krawczyk, J. (1992). A mod-
iﬁed genetic algorithm for optimal control problems.
In Computers Math. Appl., 23(12),8394.
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
54