ON THE JOINT ESTIMATION OF UNKNOWN PARAMETERS AND

DISTURBANCES IN LINEAR STOCHASTIC TIME-VARIANT

SYSTEMS

Stefano Perab

o and Qinghua Zhang

IRISA, INRIA Rennes, Campus universitaire de Beaulieu, Avenue du General Leclerc, 35042 Rennes Cedex, France

Keywords:

Fault detection, parameters estimation, linear stochastic time varying systems, adaptive signal processing.

Abstract:

Motivated by fault detection and isolation problems, we present an approach to the design of unknown param-

eters and disturbances estimators for linear time-variant stochastic systems. The main features of the proposed

method are: (a) the joint estimation of parameters and disturbances can be carried out; (b) it is a full-stochastic

approach: the unknown parameters and disturbances are random quantities and prior information, in terms

of means and covariances, can be easily taken into account; (c) the estimator structure is not ﬁxed a priori,

rather derived from the optimal inﬁnite dimensional one by means of a sliding window approximation. The

advantages with respect to the widely used parity space approach are presented.

1 INTRODUCTION

The following discrete time linear stochastic system

is considered in this brief paper:

k+1

= A

+ B

+ Ψ

p+ E

+ w

(1a)

= C

+ v

(1b)

for k ≥ 0, with A

∈ R

n×n

, B

∈ R

n×m

, Ψ

∈ R

n×q

∈ R

n× f

and C

∈ R

l×n

known time-variant matri-

ces. The vector sequences {x

}, {u

} and {y

} de-

note respectively the state, input and output stochas-

tic processes. The sequences {w

} and {v

} are as-

sumed to be zero mean, white and uncorrelated wide-

sense stochastic processes, with E[w

] = Q

and

E[v

] = R

≻ O (positive deﬁnite), where E[·] de-

notes the mathematical expectation operator. The ini-

tial condition x

has known mean E[x

] = µ

and co-

variance E[(x

− µ

)(x

− µ

)

] = P

. Both the initial

condition x

and the input process {u

} are assumed

uncorrelated with the noise sequences.

The term E

accounts for unknown disturbances

acting on the system or faults, whence the sequence

} is an unknown (and uncontrolled) input mod-

eled as a wide-sense stochastic process, not necessar-

ily stationary. The disturbances are further assumed

uncorrelated with the initial state, the noise and the in-

put processes, respectively x

, {w

}, {v

} and {u

Finally, the term Ψ

p can account for the occurrence

of parametric faults in the system (for instance with

the meaning that when p is zero no faults are present)

or for constant parameters that need to be estimated

on-line. Here p is a random variable uncorrelated with

the noise, input and disturbance processes.

The problem to be solved is the following: ﬁnd

for each N ≥ 0 the minimum variance unbiased lin-

ear estimators of the disturbances sequence d

N−1

: 0 ≤ k ≤ N − 1} and of the parameters p, given

the input and output sequences u

N−1

and y

, and the

conditions guaranteeing the uniqueness of the corre-

sponding estimates. These estimators will be denoted

respectively by

k|N

and

(since p does not depend

on time).

The following two related problems will also be

discussed in this paper. First, how to weaken the

uniqueness conditions by considering the quantities

k|N+D

for 0 ≤ k ≤ N − 1 and some appropriate delay

D > 0, which will be called, with an abuse of terms,

“delayed estimators”. Second, how to recursively and

reliably compute the estimates ˆp

|N+D

and

k|N+D

once

sample paths (measurements) u

N+D−1

and y

N+D

the input and output processes are becoming available

(by convention, italic characters will denote samples

from the corresponding random variables which, in-

385

Perabò S. and Zhang Q. (2007).

ON THE JOINT ESTIMATION OF UNKNOWN PARAMETERS AND DISTURBANCES IN LINEAR STOCHASTIC TIME-VARIANT SYSTEMS.

In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 385-388

DOI: 10.5220/0001648603850388

 SciTePress

stead, will be denoted by roman characters).

The solution proposed in this work shares many

similarities with the so called parity space method

(Chow and Willsky, 1984; Gustafsson, 2001) which

ﬁnds wide application in fault detection problems.

However it has some advantageous features that will

be presented at the end of the exposition.

Once the disturbances and parameters estimates

have been computed, state estimation becomes

straightforward and can also be easily performed on

demand. This topic is discussed in (Perab

o and

Zhang, 2007).

2 BASIC EQUATIONS FOR

ESTIMATION

Pretend for a while that the parameters and the distur-

bances sequence are known quantities, i.e. as if they

were inputs of the system described by (1), and as-

sume the following:

Assumption 1. (A

) is uniformly completely

observable and (A

1/2

) is uniformly completely

reachable.

Assumption 2. The parameters p and the disturbance

sequence {d

} are uncorrelated from the initial state

and the noise sequences {w

} and {v

Hence there is no feedback from the output to the pa-

rameters and disturbances (see (Gevers and Ander-

son, 1982) for details) and by applying well known

results of the linear estimation theory (Kailath et al.,

2000), the following innovation representation of the

output process {y

} can be derived:

∗

k+1|k

= A

∗

k|k−1

+ B

+ Ψ

p+ E

+ K

∗

(2a)

= C

∗

k|k−1

+ e

∗

, (2b)

the recursion being initiated setting

∗

0|−1

= x

where

∗

k+1|k

is the one step minimum variance

unbiased linear predictor of the state. Each term

of the innovation sequence {e

∗

} has zero mean and

covariance Λ

given by the recursive solution of the

same Riccati equation which is solved in the standard

Kalman ﬁlter (i.e. with no disturbances and unknown

parameters). With respect to this one, however, the

superscript

∗

in (2) emphasizes that the “estimates”

{ ˆx

∗

k+1|k

} cannot be computed because the realizations

p and {d

}, of p and {d

} respectively, are not really

available. Also the gains K

are computed exactely

as in the Kalman ﬁlter. By deﬁning recursively the

quantities,

= O ϒ

k+1

= (A

− K

)ϒ

+ Ψ

(3a)

= 0 s

k+1

= (A

− K

+ E

(3b)

= x

k+1

= (A

− K

+ B

+ K

(3c)

and by using (2b) it is not difﬁcult to check that the

following is true:

p+ e

∗

= y

−C

. (4)

Note that a realization of the sequence {z

} can be

computed from available data only, i.e. system matri-

ces, input and output sequences. As a matter of fact,

(3c) is exactly the Kalman ﬁlter equation that would

be obtained if p ≡ 0 and d

≡ 0 for all k.

It is possible to arrange in matrix form the set of equa-

tions obtained from (4) when k = 1,2,...,N . For ex-

ample for N = 4 one obtains





O C

O O C

O O O C

















∗





−C

, (5)

where the transition matrices Φ

are deﬁned by

= I, Φ

k+1

= (A

− K

)Φ

. (6)

For an arbitrary N, left multiply the above system by

the block diagonal matrix blkdiag{Λ

−1/2

,...,Λ

−1/2

}

in such a way that the covariance of the zero mean

vector e

∗

= vec[Λ

−1/2

∗

... Λ

−1/2

∗

] is equal to the

identity matrix. A system of the form

Ag + e

∗

= r (7)

is thus obtained, where the matrix A ∈ R

lN×( fN+q)

has

the same structure as in (5), g = vec[d

... d

N−1

p] is

the unknown term, and the vector r = vec[r

... r

]

contains the computable residuals

= Λ

−1/2

−C

). (8)

If d

≡ 0 for each k and p ≡ 0, then r = e

∗

, i.e. the

vector of residuals has zero mean and its covariance

equals the identity matrix. Any statistical test indicat-

ing a deviation from this condition can be used to de-

tect the presence of non-null disturbances and/or pa-

rameters.

Since samples of r are available but instead e can-

not be observed, the most appealing approach to es-

timate g is to compute its minimum variance linear

estimator ˆg given the random vector r. Thanks to the

Assumption 2, the following holds:

E[e

∗

] = O E[e

∗

] = O ∀k, h ≥ 0. (9)

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

386

As a result, g and e

∗

in (7) are in fact uncorrelated.

Provided that prior information on the random vec-

tor g is given in terms of its mean µ

and covari-

ance Σ

(assume Σ

invertible and the factorization

−1

= B

B), a straightforward application of linear

estimation formulas shows that ˆg and the covariance

of the error ˜g = g − ˆg can be obtained from



A + B



(ˆg − µ

) = A

(r − Aµ

) (10a)

˜g



A + B



−1

. (10b)

One could suspect, at this point, that the informa-

tion about the unknown terms which is available from

knowledge of the input and output sequences, is not

fully exploited if the only quantities that are used for

the estimation of the disturbances and parameters are

the residuals deﬁned in (8). However, as long as lin-

ear estimators are considered, it is possible to prove

that the proposed method is optimal in the sense that,

by estimating g from (7) (instead of a different linear

relation with the measurable sequences {u

N−1

})

one in fact minimizes the estimation error variance.

When sample paths of the input and output se-

quences, say {u

N−1

} and {y

}, are available, one is

faced to the problem of computing numerically the es-

timate ˆg = vec[

0|N

...

N−1|N

ˆp

] from the vector r

denoting the realization of r. To this end, the availabil-

ity or lack of prior information makes a difference. In

the following the latter case is discussed.

3 NO PRIOR INFORMATION

3.1 Estimability Conditions

The absence of prior information about g can be dealt

with by setting µ

= 0 and letting Σ

→ ∞ (or equiv-

alently Σ

−1

→ 0) which corresponds to a very large

uncertainty. Formula (10a) becomes





ˆg = A

which is the system of normal equations for comput-

ing the unique least squares solution of

Ag = r. (11)

in the unknown g, provided that the matrix A has

full column rank. From a practical point of view, It

should be noted that the proposed method requires

simply checking the rank of matrices and solving least

squares problems, for which efﬁcient numerical tools

are readily available. But, unfortunately, ﬁnding gen-

eral estimability condition in analytic form, is a very

complex task. The following is not difﬁcult to prove:

Proposition 1. For a given N ≥ 1, the estimates ˆp

and

k|N

for 0 ≤ k ≤ N − 1 are unique if and only if

the matrix A in (11) has full column rank. Moreover,

the uniqueness holds only if the following necessary

conditions are satisﬁed:

(C1) rank





O ··· O Ψ

O E

··· O Ψ

O O O E

N−1





= rN + q (12a)

(C2) rank

∑

k=1

= q. (12b)

If rank(E

) = f for all k ≥ 0 and (C1) is true for a

value N = N

min

, then it is satisﬁed for all values N ≥

min

. Analogously, if (C2) is true for a value N =

min

, then it is satisﬁed for all values N ≥ N

min

3.2 Delayed Estimation

Consider ﬁrst the case when there are no unknown

parameters (q = 0). A sufﬁcient (but not necessary)

condition to ensure that A has full column rank for all

N ≥ 1, hence the uniqueness of the estimates

k|N

for

0 ≤ k ≤ N − 1, is the following:

(C3) rank(C

k+1

) = f ∀k ≥ 0. (13)

However, when (C3) is not satisﬁed, it could still be

possible to compute, for some delay D > 0, unique

delayed estimates

k|N+D

for 0 ≤ k ≤ N − 1. To ex-

emplify what has been just asserted, consider the case

k+1

= O and thus (C3) is not satisﬁed (this situ-

ation may happen typically when C

k+1

and E

have

both some zero entries, for example C

k+1

= [1 0] and

= [0 1]

). Then the zero blocks appear in the term

Ag in (7) as shown in the following scheme (suppose,

for example, that N = 4):

N = 4







× × × ∗

× × ∗ O O

× ∗ O O

∗ O O O

O O O O



















It is evident that d

N−1

in the example above) is

not estimable from measurements collected till time

N (in other words

3|4

is not unique). However, if the

blocks marked with a ∗, i.e. the matrices C

k+2

k+1

in (7), have full column rank, it is sufﬁcient to add

the measurements at time N + 1 (at time 5 to continue

the example) so that the unique estimates

k|N+1

for

k = 0,...,N−1 and, in particular

N−1|N+1

(in the ex-

ample

3|5

), could be computed. The above argument

can be generalized as follows: if for some D > 0 the

conditions

(C4a) rank(C

k+D+1

k+1

) = f ∀k ≥ 0 (14a)

(C4b)





k+D

k+1

···

k+2

k+1





= O (14b)

ON THE JOINT ESTIMATION OF UNKNOWN PARAMETERS AND DISTURBANCES IN LINEAR STOCHASTIC

TIME-VARIANT SYSTEMS

387

are satisﬁed, then the estimates

k|N+D

for 0 ≤ k ≤

N − 1 are unique (even if A has not full rank).

When there are unknown parameters (q > 0), the

conditions in (13) or (14) are no longer sufﬁcient and,

in general, the rank of the matrix A has to be checked

numerically. However note the following result:

Proposition 2. (a) Assuming that condition (C3)

in (13) is satisﬁed, if the estimates ˆp

and

k|N

for

0 ≤ k ≤ N − 1 are unique (i.e. the matrix A has full

column rank) for a value N = N

min

, then they are

unique also for all N ≥ N

min

(b) Analogously, assuming that conditions (C4) in

(14) are satisﬁed, if the delayed estimates ˆp

|N+D

and

k|N+D

for 0 ≤ k ≤ N − 1 are unique for a value

N = N

min

, then they are unique also for all N ≥ N

min

3.3 Approximate Recursive Estimation

In order to compute the estimates from (11), a grow-

ing size least squares problem as to be solved as N in-

creases. Observe, however, that the upper left blocks

of the matrix A tend to zero as N grows, because

the uniform observability and reachability assump-

tion guarantees that the transition matrices Φ

de-

ﬁned in (6) tend to the null matrix as the difference

k − h → ∞. Hence, it is natural to consider an ap-

proximate problem by replacing A with A + E, where

E annihilates the blocks Λ

−1/2

h−1

such that

k − h ≥ L ≥ L

min

, where L

min

≥ 1 is the minimum

value guaranteeing that rank(A) = rank(A +E) for all

N, so that the estimability properties of the original

problem are conserved also in the approximate one.

Obviously, the accuracy of the approximate solution

increases as L→ ∞. The system (A+E)g= r has thus

the banded structure shown in the following scheme

(for N = 5 and L = 3):







× × × ×

× × ×

× ×































In the above, also an initial data window has been in-

dicated with a solid line box. Using the numerical

techniques described for example in (Bj

orck, 1996,

Chapter 6.2), this approximate least squares problem

can then be solved recursively using a sliding window

procedure.

3.4 Comparison with the Parity Space

Approach

In the parity space method, the parameters and distur-

bances are estimated from a set of relations which can

be cast in the form

Ag + w = ¯r. The matrix

A differs

from A in (7) only because the transition matrices Φ

deﬁned in (6) are replaced by Γ

= A

k−1

...A

h+1

Moreover, the covariance of the noise term w does

not equal the identity matrix and the residuals ¯r are

built in a different way.

The approach proposed here is new in that it makes

explicit reference to the innovation representation of

the system (1), with the following advantages:

(a) The components of the noise term e

∗

are indepen-

dent and normalized, while an important drawback of

the parity space approach is that the covariance of the

noise term w has to be whitened before computing

the least squares estimate, thus increasing the compu-

tational load, especially for large scale problems.

(b) If the matrices A

are not stable, as it can hap-

pen typically in control problems, the matrix

A could

be largely ill-conditioned, thus making numerically

harder the process of computing reliably the estimate,

especially for large window sizes.

affects the residuals r

through the sequence {z

}. However the transition

matrices Φ

are stable. Hence the effect of the initial

condition is asymptotically forgot as k → +∞. As a

consequence, when using the sliding window estima-

tion procedure, one has not to take care of the esti-

mation or rejection of the state at the initial time of

the window as happens for the parity space approach

ornqvist and Gustafsson, 2006).

REFERENCES

orck, A. (1996). Numerical Methods for Least Squares

Problems. SIAM.

Chow, E. Y. and Willsky, A. S. (1984). Analytical redun-

dancy and the design of robust failure detection sys-

tems. IEEE T. Automat. Contr., AC-29(7):603–614.

Gevers, M. R. and Anderson, B. D. O. (1982). On jointly

stationary feedback-free stochastic processes. IEEE T.

Automat. Contr., AC-27(2):431–436.

Gustafsson, F. (2001). Adaptive Filtering and Change De-

tection. John Wiley & Sons, Ltd.

Kailath, T., Sayed, A. H., and Hassibi, B. (2000). Linear

Estimation. Prentice Hall.

Perab

o, S. and Zhang, Q. (2007). Adaptive observers

for linear time-variant stochastic systems with distur-

bances. In Proceedings of the European Control Con-

ference 2007, Kos, Greece. (accepted for publication).

ornqvist, D. and Gustafsson, F. (2006). Eliminating the

initial state for the generalized likelihood ratio test.

In Proceedings of the 6

IFAC Symposium on Fault

Detection, Supervision and Safety of Technical Pro-

cesses, pages 643–648, Beijing, P. R. China.

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

388