ON THE JOINT ESTIMATION OF UNKNOWN PARAMETERS AND
DISTURBANCES IN LINEAR STOCHASTIC TIME-VARIANT
SYSTEMS
Stefano Perab
`
o and Qinghua Zhang
IRISA, INRIA Rennes, Campus universitaire de Beaulieu, Avenue du General Leclerc, 35042 Rennes Cedex, France
Keywords:
Fault detection, parameters estimation, linear stochastic time varying systems, adaptive signal processing.
Abstract:
Motivated by fault detection and isolation problems, we present an approach to the design of unknown param-
eters and disturbances estimators for linear time-variant stochastic systems. The main features of the proposed
method are: (a) the joint estimation of parameters and disturbances can be carried out; (b) it is a full-stochastic
approach: the unknown parameters and disturbances are random quantities and prior information, in terms
of means and covariances, can be easily taken into account; (c) the estimator structure is not fixed a priori,
rather derived from the optimal infinite dimensional one by means of a sliding window approximation. The
advantages with respect to the widely used parity space approach are presented.
1 INTRODUCTION
The following discrete time linear stochastic system
is considered in this brief paper:
x
k+1
= A
k
x
k
+ B
k
u
k
+ Ψ
k
p+ E
k
d
k
+ w
k
(1a)
y
k
= C
k
x
k
+ v
k
(1b)
for k 0, with A
k
R
n×n
, B
k
R
n×m
, Ψ
k
R
n×q
,
E
k
R
n× f
and C
k
R
l×n
known time-variant matri-
ces. The vector sequences {x
k
}, {u
k
} and {y
k
} de-
note respectively the state, input and output stochas-
tic processes. The sequences {w
k
} and {v
k
} are as-
sumed to be zero mean, white and uncorrelated wide-
sense stochastic processes, with E[w
k
w
T
k
] = Q
k
and
E[v
k
v
T
k
] = R
k
O (positive definite), where E[·] de-
notes the mathematical expectation operator. The ini-
tial condition x
0
has known mean E[x
0
] = µ
0
and co-
variance E[(x
0
µ
0
)(x
0
µ
0
)
T
] = P
0
. Both the initial
condition x
0
and the input process {u
k
} are assumed
uncorrelated with the noise sequences.
The term E
k
d
k
accounts for unknown disturbances
acting on the system or faults, whence the sequence
{d
k
} is an unknown (and uncontrolled) input mod-
eled as a wide-sense stochastic process, not necessar-
ily stationary. The disturbances are further assumed
uncorrelated with the initial state, the noise and the in-
put processes, respectively x
0
, {w
k
}, {v
k
} and {u
k
}.
Finally, the term Ψ
k
p can account for the occurrence
of parametric faults in the system (for instance with
the meaning that when p is zero no faults are present)
or for constant parameters that need to be estimated
on-line. Here p is a random variable uncorrelated with
the noise, input and disturbance processes.
The problem to be solved is the following: find
for each N 0 the minimum variance unbiased lin-
ear estimators of the disturbances sequence d
N1
0
=
{d
k
: 0 k N 1} and of the parameters p, given
the input and output sequences u
N1
0
and y
N
0
, and the
conditions guaranteeing the uniqueness of the corre-
sponding estimates. These estimators will be denoted
respectively by
ˆ
d
k|N
and
ˆ
p
|N
(since p does not depend
on time).
The following two related problems will also be
discussed in this paper. First, how to weaken the
uniqueness conditions by considering the quantities
ˆ
d
k|N+D
for 0 k N 1 and some appropriate delay
D > 0, which will be called, with an abuse of terms,
“delayed estimators”. Second, how to recursively and
reliably compute the estimates ˆp
|N+D
and
ˆ
d
k|N+D
once
sample paths (measurements) u
N+D1
0
and y
N+D
0
of
the input and output processes are becoming available
(by convention, italic characters will denote samples
from the corresponding random variables which, in-
385
Perabò S. and Zhang Q. (2007).
ON THE JOINT ESTIMATION OF UNKNOWN PARAMETERS AND DISTURBANCES IN LINEAR STOCHASTIC TIME-VARIANT SYSTEMS.
In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 385-388
DOI: 10.5220/0001648603850388
Copyright
c
SciTePress
stead, will be denoted by roman characters).
The solution proposed in this work shares many
similarities with the so called parity space method
(Chow and Willsky, 1984; Gustafsson, 2001) which
finds wide application in fault detection problems.
However it has some advantageous features that will
be presented at the end of the exposition.
Once the disturbances and parameters estimates
have been computed, state estimation becomes
straightforward and can also be easily performed on
demand. This topic is discussed in (Perab
`
o and
Zhang, 2007).
2 BASIC EQUATIONS FOR
ESTIMATION
Pretend for a while that the parameters and the distur-
bances sequence are known quantities, i.e. as if they
were inputs of the system described by (1), and as-
sume the following:
Assumption 1. (A
k
,C
k
) is uniformly completely
observable and (A
k
,Q
1/2
k
) is uniformly completely
reachable.
Assumption 2. The parameters p and the disturbance
sequence {d
k
} are uncorrelated from the initial state
x
0
and the noise sequences {w
k
} and {v
k
}.
Hence there is no feedback from the output to the pa-
rameters and disturbances (see (Gevers and Ander-
son, 1982) for details) and by applying well known
results of the linear estimation theory (Kailath et al.,
2000), the following innovation representation of the
output process {y
k
} can be derived:
ˆ
x
k+1|k
= A
k
ˆ
x
k|k1
+ B
k
u
k
+ Ψ
k
p+ E
k
d
k
+ K
k
e
k
(2a)
y
k
= C
k
ˆ
x
k|k1
+ e
k
, (2b)
the recursion being initiated setting
ˆ
x
0|−1
= x
0
,
where
ˆ
x
k+1|k
is the one step minimum variance
unbiased linear predictor of the state. Each term
of the innovation sequence {e
k
} has zero mean and
covariance Λ
k
given by the recursive solution of the
same Riccati equation which is solved in the standard
Kalman filter (i.e. with no disturbances and unknown
parameters). With respect to this one, however, the
superscript
in (2) emphasizes that the “estimates”
{ ˆx
k+1|k
} cannot be computed because the realizations
p and {d
k
}, of p and {d
k
} respectively, are not really
available. Also the gains K
k
are computed exactely
as in the Kalman filter. By defining recursively the
quantities,
ϒ
0
= O ϒ
k+1
= (A
k
K
k
C
k
)ϒ
k
+ Ψ
k
(3a)
s
0
= 0 s
k+1
= (A
k
K
k
C
k
)s
k
+ E
k
d
k
(3b)
z
0
= x
0
z
k+1
= (A
k
K
k
C
k
)z
k
+ B
k
u
k
+ K
k
y
k
(3c)
and by using (2b) it is not difficult to check that the
following is true:
C
k
s
k
+C
k
ϒ
k
p+ e
k
= y
k
C
k
z
k
. (4)
Note that a realization of the sequence {z
k
} can be
computed from available data only, i.e. system matri-
ces, input and output sequences. As a matter of fact,
(3c) is exactly the Kalman filter equation that would
be obtained if p 0 and d
k
0 for all k.
It is possible to arrange in matrix form the set of equa-
tions obtained from (4) when k = 1,2,...,N . For ex-
ample for N = 4 one obtains
C
4
Φ
4
1
E
0
C
4
Φ
4
2
E
1
C
4
Φ
4
3
E
2
C
4
E
3
C
4
ϒ
4
C
3
Φ
3
1
E
0
C
3
Φ
3
2
E
1
C
3
E
2
O C
3
ϒ
3
C
2
Φ
2
1
E
0
C
2
E
1
O O C
2
ϒ
2
C
1
E
0
O O O C
1
ϒ
1
d
0
d
1
d
2
d
3
p
+
+
e
4
e
3
e
2
e
1
=
"
y
4
C
4
z
4
y
3
C
3
z
3
y
2
C
2
z
2
y
1
C
1
z
1
#
, (5)
where the transition matrices Φ
k
h
are defined by
Φ
h
h
= I, Φ
k+1
h
= (A
k
K
k
C
k
)Φ
k
h
. (6)
For an arbitrary N, left multiply the above system by
the block diagonal matrix blkdiag{Λ
1/2
N
,...,Λ
1/2
1
}
in such a way that the covariance of the zero mean
vector e
= vec[Λ
1/2
N
e
N
... Λ
1/2
1
e
1
] is equal to the
identity matrix. A system of the form
Ag + e
= r (7)
is thus obtained, where the matrix A R
lN×( fN+q)
has
the same structure as in (5), g = vec[d
0
... d
N1
p] is
the unknown term, and the vector r = vec[r
N
... r
1
]
contains the computable residuals
r
k
= Λ
1/2
k
(y
k
C
k
z
k
). (8)
If d
k
0 for each k and p 0, then r = e
, i.e. the
vector of residuals has zero mean and its covariance
equals the identity matrix. Any statistical test indicat-
ing a deviation from this condition can be used to de-
tect the presence of non-null disturbances and/or pa-
rameters.
Since samples of r are available but instead e can-
not be observed, the most appealing approach to es-
timate g is to compute its minimum variance linear
estimator ˆg given the random vector r. Thanks to the
Assumption 2, the following holds:
E[e
k
d
T
h
] = O E[e
k
p
T
] = O k, h 0. (9)
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
386
As a result, g and e
in (7) are in fact uncorrelated.
Provided that prior information on the random vec-
tor g is given in terms of its mean µ
g
and covari-
ance Σ
g
(assume Σ
g
invertible and the factorization
Σ
1
g
= B
T
B), a straightforward application of linear
estimation formulas shows that ˆg and the covariance
of the error ˜g = g ˆg can be obtained from
A
T
A + B
T
B
(ˆg µ
g
) = A
T
(r Aµ
g
) (10a)
Σ
˜g
=
A
T
A + B
T
B
1
. (10b)
One could suspect, at this point, that the informa-
tion about the unknown terms which is available from
knowledge of the input and output sequences, is not
fully exploited if the only quantities that are used for
the estimation of the disturbances and parameters are
the residuals defined in (8). However, as long as lin-
ear estimators are considered, it is possible to prove
that the proposed method is optimal in the sense that,
by estimating g from (7) (instead of a different linear
relation with the measurable sequences {u
N1
0
,y
N
0
})
one in fact minimizes the estimation error variance.
When sample paths of the input and output se-
quences, say {u
N1
0
} and {y
N
0
}, are available, one is
faced to the problem of computing numerically the es-
timate ˆg = vec[
ˆ
d
0|N
...
ˆ
d
N1|N
ˆp
|N
] from the vector r
denoting the realization of r. To this end, the availabil-
ity or lack of prior information makes a difference. In
the following the latter case is discussed.
3 NO PRIOR INFORMATION
3.1 Estimability Conditions
The absence of prior information about g can be dealt
with by setting µ
g
= 0 and letting Σ
g
(or equiv-
alently Σ
1
g
0) which corresponds to a very large
uncertainty. Formula (10a) becomes
A
T
A
ˆg = A
T
r
which is the system of normal equations for comput-
ing the unique least squares solution of
Ag = r. (11)
in the unknown g, provided that the matrix A has
full column rank. From a practical point of view, It
should be noted that the proposed method requires
simply checking the rank of matrices and solving least
squares problems, for which efficient numerical tools
are readily available. But, unfortunately, finding gen-
eral estimability condition in analytic form, is a very
complex task. The following is not difficult to prove:
Proposition 1. For a given N 1, the estimates ˆp
|N
and
ˆ
d
k|N
for 0 k N 1 are unique if and only if
the matrix A in (11) has full column rank. Moreover,
the uniqueness holds only if the following necessary
conditions are satisfied:
(C1) rank
E
0
O ··· O Ψ
0
O E
1
··· O Ψ
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
O O O E
N1
Ψ
N1
= rN + q (12a)
(C2) rank
N
k=1
ϒ
T
k
C
T
k
C
k
ϒ
k
!
= q. (12b)
If rank(E
k
) = f for all k 0 and (C1) is true for a
value N = N
min
, then it is satisfied for all values N
N
min
. Analogously, if (C2) is true for a value N =
N
min
, then it is satisfied for all values N N
min
.
3.2 Delayed Estimation
Consider first the case when there are no unknown
parameters (q = 0). A sufficient (but not necessary)
condition to ensure that A has full column rank for all
N 1, hence the uniqueness of the estimates
ˆ
d
k|N
for
0 k N 1, is the following:
(C3) rank(C
k+1
E
k
) = f k 0. (13)
However, when (C3) is not satisfied, it could still be
possible to compute, for some delay D > 0, unique
delayed estimates
ˆ
d
k|N+D
for 0 k N 1. To ex-
emplify what has been just asserted, consider the case
C
k+1
E
k
= O and thus (C3) is not satisfied (this situ-
ation may happen typically when C
k+1
and E
k
have
both some zero entries, for example C
k+1
= [1 0] and
E
k
= [0 1]
T
). Then the zero blocks appear in the term
Ag in (7) as shown in the following scheme (suppose,
for example, that N = 4):
5
N = 4
3
2
1
× × ×
O
× × O O
× O O
O
O O O
O
O O O O
O
d
0
d
1
d
2
d
3
d
4
.
It is evident that d
N1
(d
3
in the example above) is
not estimable from measurements collected till time
N (in other words
ˆ
d
3|4
is not unique). However, if the
blocks marked with a , i.e. the matrices C
k+2
Φ
k+2
k+1
E
k
in (7), have full column rank, it is sufficient to add
the measurements at time N + 1 (at time 5 to continue
the example) so that the unique estimates
ˆ
d
k|N+1
for
k = 0,...,N1 and, in particular
ˆ
d
N1|N+1
(in the ex-
ample
ˆ
d
3|5
), could be computed. The above argument
can be generalized as follows: if for some D > 0 the
conditions
(C4a) rank(C
k+D+1
Φ
k+D+1
k+1
E
k
) = f k 0 (14a)
(C4b)
C
k+D
Φ
k+D
k+1
E
k
···
C
k+2
Φ
k+2
k+1
E
k
C
k+1
E
k
= O (14b)
ON THE JOINT ESTIMATION OF UNKNOWN PARAMETERS AND DISTURBANCES IN LINEAR STOCHASTIC
TIME-VARIANT SYSTEMS
387
are satisfied, then the estimates
ˆ
d
k|N+D
for 0 k
N 1 are unique (even if A has not full rank).
When there are unknown parameters (q > 0), the
conditions in (13) or (14) are no longer sufficient and,
in general, the rank of the matrix A has to be checked
numerically. However note the following result:
Proposition 2. (a) Assuming that condition (C3)
in (13) is satisfied, if the estimates ˆp
|N
and
ˆ
d
k|N
for
0 k N 1 are unique (i.e. the matrix A has full
column rank) for a value N = N
min
, then they are
unique also for all N N
min
.
(b) Analogously, assuming that conditions (C4) in
(14) are satisfied, if the delayed estimates ˆp
|N+D
and
ˆ
d
k|N+D
for 0 k N 1 are unique for a value
N = N
min
, then they are unique also for all N N
min
.
3.3 Approximate Recursive Estimation
In order to compute the estimates from (11), a grow-
ing size least squares problem as to be solved as N in-
creases. Observe, however, that the upper left blocks
of the matrix A tend to zero as N grows, because
the uniform observability and reachability assump-
tion guarantees that the transition matrices Φ
k
h
de-
fined in (6) tend to the null matrix as the difference
k h . Hence, it is natural to consider an ap-
proximate problem by replacing A with A + E, where
E annihilates the blocks Λ
1/2
k
C
k
Φ
k
h
E
h1
such that
k h L L
min
, where L
min
1 is the minimum
value guaranteeing that rank(A) = rank(A +E) for all
N, so that the estimability properties of the original
problem are conserved also in the approximate one.
Obviously, the accuracy of the approximate solution
increases as L . The system (A+E)g= r has thus
the banded structure shown in the following scheme
(for N = 5 and L = 3):
× × × ×
× × × ×
× × × ×
× × ×
× ×
·
d
0
d
1
d
2
d
3
d
4
p
=
r
5
r
4
r
3
r
2
r
1
In the above, also an initial data window has been in-
dicated with a solid line box. Using the numerical
techniques described for example in (Bj
¨
orck, 1996,
Chapter 6.2), this approximate least squares problem
can then be solved recursively using a sliding window
procedure.
3.4 Comparison with the Parity Space
Approach
In the parity space method, the parameters and distur-
bances are estimated from a set of relations which can
be cast in the form
¯
Ag + w = ¯r. The matrix
¯
A differs
from A in (7) only because the transition matrices Φ
k
h
defined in (6) are replaced by Γ
k
h
= A
k1
...A
h+1
A
h
.
Moreover, the covariance of the noise term w does
not equal the identity matrix and the residuals ¯r are
built in a different way.
The approach proposed here is new in that it makes
explicit reference to the innovation representation of
the system (1), with the following advantages:
(a) The components of the noise term e
are indepen-
dent and normalized, while an important drawback of
the parity space approach is that the covariance of the
noise term w has to be whitened before computing
the least squares estimate, thus increasing the compu-
tational load, especially for large scale problems.
(b) If the matrices A
k
are not stable, as it can hap-
pen typically in control problems, the matrix
¯
A could
be largely ill-conditioned, thus making numerically
harder the process of computing reliably the estimate,
especially for large window sizes.
(c) The initial condition x
0
affects the residuals r
through the sequence {z
k
}. However the transition
matrices Φ
k
h
are stable. Hence the effect of the initial
condition is asymptotically forgot as k +. As a
consequence, when using the sliding window estima-
tion procedure, one has not to take care of the esti-
mation or rejection of the state at the initial time of
the window as happens for the parity space approach
(T
¨
ornqvist and Gustafsson, 2006).
REFERENCES
Bj
¨
orck, A. (1996). Numerical Methods for Least Squares
Problems. SIAM.
Chow, E. Y. and Willsky, A. S. (1984). Analytical redun-
dancy and the design of robust failure detection sys-
tems. IEEE T. Automat. Contr., AC-29(7):603–614.
Gevers, M. R. and Anderson, B. D. O. (1982). On jointly
stationary feedback-free stochastic processes. IEEE T.
Automat. Contr., AC-27(2):431–436.
Gustafsson, F. (2001). Adaptive Filtering and Change De-
tection. John Wiley & Sons, Ltd.
Kailath, T., Sayed, A. H., and Hassibi, B. (2000). Linear
Estimation. Prentice Hall.
Perab
`
o, S. and Zhang, Q. (2007). Adaptive observers
for linear time-variant stochastic systems with distur-
bances. In Proceedings of the European Control Con-
ference 2007, Kos, Greece. (accepted for publication).
T
¨
ornqvist, D. and Gustafsson, F. (2006). Eliminating the
initial state for the generalized likelihood ratio test.
In Proceedings of the 6
th
IFAC Symposium on Fault
Detection, Supervision and Safety of Technical Pro-
cesses, pages 643–648, Beijing, P. R. China.
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
388