Carreira-Perpi˜n´an, M. (2002). Mode-ﬁnding for mixtures of
Gaussian distributions. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 22(11):1318–
1323.
Erdogmus, D. and Principe, J. (2002). An error-entropy
minimization algorithm for supervised training of
nonlinear adaptive systems. Signal Processing, IEEE
Transactions on, 50(7):1780–1786.
Gray, R. (2010). Entropy and information theory. Springer
Verlag.
Hastie, T., Tibshirani, R., Friedman, J., and Franklin, J.
(2005). The elements of statistical learning: data min-
ing, inference and prediction. The Mathematical In-
telligencer, 27(2):83–85.
Miller, K. (1964). Multidimensional gaussian distributions.
Wiley New York.
Parzen, E. (1962). On estimation of a probability density
function and mode. The annals of mathematical statis-
tics, 33(3):1065–1076.
Popovic, D., Milosavljevic, V., Zekic, A., Macgearailt, N.,
and Daniels, S. (2009). Impact of low pressure plasma
discharge on etch rate of SiO2 wafer. In APS Meeting
Abstracts, volume 1, page 8037P.
Principe, J. (2010). Information Theoretic Learning:
Renyi’s Entropy and Kernel Perspectives. Springer
Verlag.
Principe, J., Xu, D., Zhao, Q., and Fisher, J. (2000). Learn-
ing from examples with information theoretic criteria.
The Journal of VLSI Signal Processing, 26(1):61–77.
Rallo, R., Ferre-Gin´e, J., Arenas, A., and Giralt, F. (2002).
Neural virtual sensor for the inferential prediction of
product quality from process variables. Computers &
Chemical Engineering, 26(12):1735–1754.
Scholkopf, B. and Smola, A. (2002). Learning with kernels,
volume 64. Citeseer.
Silverman, B. (1986). Density Estimation for Statistics and
Data Analysis. Number 26 in Monographs on statis-
tics and applied probability.
Wang, P. and Vachtsevanos, G. (2001). Fault prognostics
using dynamic wavelet neural networks. AI EDAM,
15(04):349–365.
Weber, A. (2007). Virtual metrology and your technol-
ogy watch list: ten things you should know about
this emerging technology. Future Fab International,
22:52–54.
APPENDIX A
Problem 1 is an unconstrained global optimization
problem. In order to derive a solution, we consider
the features of c
∗
in the following
Proposition 5. Let c
∗
be the global minimum of a
negatively weighted sum of Gaussian densities J(c).
Thanks to the properties of J(c), there exists a real m
such that
||c
∗
−d
ij
||
2
D
−1
ij
≤ m
for at least one mean vector d
ij
. Furthermore,
m ≤log
C
0
˜
J
where C
0
is a negative constant and
˜
J = min
i, j
J(d
ij
)
That is, m is superiorly limited by a decreasing
function of the minimum value of J evaluated in the
mean vectors d
ij
.
Proposition 5 has two notable implications: (i)
there is at least one mean vector d
ij
that serves as suit-
able starting point for a local optimization procedure,
and (ii) the global minimum gets closer to one of the
mean vectors as the computable quantity
˜
J increases.
Using these results, c
∗
is found by means of the fol-
lowing
Algorithm 1: solution of Problem 1.
1. Set c
∗
= 0
N
2. For i = 1, ... ,N
(a) For j = i+ 1,. ..,N
• Use a Newton-Raphson algorithm to solve
the local optimization problem
c
∗
ij
= argmin
c
J(c)
using d
ij
as starting point.
• If J(c
∗
ij
) < J(c
∗
)
• c
∗
= c
∗
ij
• End if
(b) End for
3. End for
Algorithm 1 was originally proposed in (Carreira-
Perpi˜n´an, 2002), and guarantees to ﬁnd all the modes
of a mixture of Gaussian distributions. It is to be
noted, however, that the exhaustive search performed
by Algorithm 1 might be computationally demanding,
and only the global minimum of J is of interest in the
presented case. It is convenient, if an approximate
optimal solution is acceptable, to perform convex op-
timization using a reduced number of starting points.
In our experiments, the best performances were ob-
tained using the d
ij
associated to the least (1% to 5%)
values of {J(d
ij
)}. It is to be noted that this reduced
version of Algorithm 1 does not guarantee to reach
the global minimum (although it has been veriﬁed, via
simulation studies, that the global optimum is found
with a very high success rate). In order to set up the
Newton-Raphson algorithm used in Algorithm 1, it is
NONPARAMETRIC VIRTUAL SENSORS FOR SEMICONDUCTOR MANUFACTURING - Using Information
Theoretic Learning and Kernel Machines
357