Electrical Appliances Identification and Clustering using Novel Turn-on
Transient Features
Mohamed Nait Meziane
1
, Abdenour Hacine-Gharbi
2
, Philippe Ravier
1
, Guy Lamarque
1
,
Jean-Charles Le Bunetel
3
and Yves Raingeaud
3
1
PRISME Laboratory, University of Orl
´
eans, 12 rue de Blois, 45067 Orl
´
eans, France
2
LMSE Laboratory, University of Bordj Bou Arr
´
eridj, Elanasser, 34030 Bordj Bou Arr
´
eridj, Algeria
3
GREMAN Laboratory, UMR 7347 CNRS - University of Tours, 20 avenue Monge, 37200 Tours, France
{mohamed.nait-meziane, philippe.ravier}@univ-orleans.fr, gharbi07@yahoo.fr, {lebunetel, yves.raingeaud}@univ-tours.fr
Keywords:
Electrical Appliances Identification and Clustering, Energy Disaggregation, Non-Intrusive Load Monitor-
ing (NILM), Sequential Forward Search (SFS) Algorithm, Supervised and Unsupervised Classification,
Turn-on Transient Features, Wrappers Feature Selection.
Abstract:
Due to the growing need for a detailed consumption information in the context of energy efficiency, differ-
ent energy disaggregation, also called Non-Intrusive Load Monitoring (NILM), methods have been proposed.
These methods may be subdivided into supervised and unsupervised approaches. Electrical appliance classi-
fication is one of the tasks a NILM system should perform. Depending on the chosen NILM approach, the
classification task consists of either identifying the appliances or grouping them into clusters. In this paper,
we present the results of appliance identification and clustering using the Controlled On/Off Loads Library
(COOLL) dataset. We use novel features extracted from a recently proposed turn-on transient current model
for both identification and clustering. The results show that the amplitude-related features of this model are
the most suited for appliance identification (giving a classification rate (CR) of 98.57%) whereas the envelope-
related features are the most adapted for appliance clustering.
1 INTRODUCTION
In the context of energy efficiency, Non-Intrusive
Load Monitoring (NILM) approaches aim to pro-
vide, in a non-intrusive manner, detailed energy con-
sumption. This detailed information helps increase
the awareness about energy consumption behavior
of consumers along with other benefits. The ben-
efits of such consumption feedback were discussed
in several previous works (Fischer, 2008) (Darby,
2010) (Hancke et al., 2012). The interest in this field
pioneered by Hart’s work during the mid-1980s (Hart,
1985) (Hart, 1989) (Hart, 1992) started to grow
rapidly these past few years starting around the year
2010 (Parson, 2016).
Along with the appliance working periods and the
consumed energy, appliance class is a very important
output of a NILM system. NILM approaches may be
classified using different criteria (Zeifman and Roth,
2011). One possible taxonomy is subdividing the
approaches into supervised and unsupervised (Zoha
et al., 2012) depending on the chosen strategy for
appliance-related information inference.
Supervised NILM approaches are the most com-
monly found in the literature. These approaches need
labeled data for the training of the appliance classifier.
A major drawback of these approaches is the non ro-
bustness with respect to unseen appliances especially
when the training dataset size is small.
To alleviate this problem, an alternative is the use
of unsupervised NILM approaches (Bonfigli et al.,
2015). These approaches try to solve the NILM prob-
lem (i.e. to obtain detailed consumption informa-
tion) without a priori information (Zoha et al., 2012).
Several challenges face theses approaches (Goncalves
et al., 2011). Nevertheless, they are more adapted to
solve the NILM problem in real case scenarios where
unseen and different appliance types may be encoun-
tered.
A mid-way approach that is worth mentioning
is the semi-supervised approach (Barsim and Yang,
2015) (Gillis and Morsi, 2016). This approach is a
mix between both supervised and unsupervised ap-
proaches where a training step (supervised) helps in
the prediction of the appliance type using a unsuper-
vised approach.
Meziane, M., Hacine-Gharbi, A., Ravier, P., Lamarque, G., Bunetel, J-C. and Raingeaud, Y.
Electrical Appliances Identification and Clustering using Novel Turn-on Transient Features.
DOI: 10.5220/0006245706470654
In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 647-654
ISBN: 978-989-758-222-6
Copyright
c
2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
647
According to the above-mentioned approaches,
electrical appliances classification problem can be
subdivided into two sub-problems: identification and
clustering. Identification is a supervised problem.
Having a set of data (called training dataset), labeled
with appliance type, the task is to identify the class
(appliance type) to which a new appliance belongs.
Clustering, on the other hand, is an unsupervised
problem. Having an unlabeled set of data (the appli-
ance types are unknown), the task is to find clusters,
or groups, that define classes corresponding to appli-
ances with common characteristics.
The aim of this paper is to present the results
of both appliance identification and clustering us-
ing novel features extracted from a recently proposed
turn-on transient current model (Nait Meziane et al.,
2015). Conclusions on the usefulness of these fea-
tures for both tasks are drawn.
The paper is organized as follows: section 2
describes the features used for appliance classifica-
tion and the corresponding turn-on transient current
model. This section also discusses the relevance of
these features for the classification task and motivates
the use of some chosen features instead of all the esti-
mated ones. Section 3 presents the identification and
clustering results. It also gives a brief description
of the dataset used. The paper is concluded in sec-
tion 4 where conclusions are drawn and some possi-
ble tracks for the improvement of the presented work
are given.
2 TURN-ON TRANSIENT
FEATURES
2.1 Turn-on Transient Model
The work presented in this paper is based on
modeling the turn-on transient current signal us-
ing a recently proposed parametric mathematical
model (Nait Meziane et al., 2015). The parameters
of this model are then estimated and used as fea-
tures for electrical appliances classification. One of
the goals of this work is to assess the usefulness of
these model parameters for classification. For sim-
plicity, we suppose herein stationary amplitudes and
phases in contrast to the more general model pre-
sented in (Nait Meziane et al., 2015).
According to this mathematical model, the turn-
on transient current is an amplitude modulated sum-
of-sinusoids that can be written as:
s(t) = e(t)s
s
(t) + w(t), (1)
where
s
s
(t) =
d
i=1
A
i
cos(2π f
i
t + φ
i
), (2)
e(t) =
(
A
0
e
b
T
t
+ 1 , if t 0
0 , otherwise
(3)
and w(t) is an additive white Gaussian noise
(AWGN)
1
. s
s
(t) is a sum of undamped sinusoids such
that A
i
( 0), φ
i
[π, π] and f
i
are their amplitudes,
phases and frequencies, respectively. The number
of sinusoids d is supposed fixed and known a priori.
The sinusoids frequencies are also known and are odd
order-harmonics such that f
i
= (2i 1) f
0
, i = 1, . . . , d
( f
0
= 50 Hz is the fundamental frequency also called
mains frequency).
The amplitude modulation, or envelope, e(t) de-
scribes the current amplitude variation from the
turn-on until reaching the steady-state phase. b =
[b
1
, . . . , b
n
]
T
is a vector of n polynomial coefficients
and t = [t, . . . , t
n
]
T
is a time vector such that b
T
t is a
n
th
degree polynomial allowing to tune the model am-
plitude variation to the real signal amplitude variation.
A
0
is a parameter that specifies the initial amplitude of
e(t) i.e. when t = 0, e(t = 0) = A
0
+ 1.
For the work presented in this paper, and after dif-
ferent tests on real signals, we chose d = 5 harmonics
and a polynomial order n = 3. These values provide a
good fit between the model and real signals.
All the model parameters can be put in-
side one vector θ
θ
θ = [A
0
, b
1
, b
2
, b
3
, A
1
, A
2
, A
3
, A
4
, A
5
,
φ
1
, φ
2
, φ
3
, φ
4
, φ
5
]
T
. The used algorithm for the es-
timation of θ
θ
θ is based on a nonlinear least-squares
optimization algorithm called trust-region reflec-
tive (Coleman and Li, 1994). A detailed description
of the model, the theoretical limits of the variance of
the estimated parameters (Cram
´
er-Rao Bounds) and
the estimation algorithm will be considered in an up-
coming work.
2.2 Relevance of the Turn-on Transient
Features for Classification
In this sub-section we will analyze the features in
order to select the most relevant ones for appliance
classification amongst the elements of the θ
θ
θ vector.
First, we do a pre-analysis for these features to get
an insight and conclude on their usefulness for ap-
pliance classification. Then, we compare the conclu-
sions with the results of a feature selection algorithm.
1
This assumption is verified for the measurements of the
COOLL as shown in (Nait Meziane et al., 2016) where the
measurement system used to create COOLL is described.
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods
648
0 5 10 15
!"#$%& '()*+ ,-./
-4
-2
0
2
4
0*1*-(#(1 2*)3(
4*5 6 %7 8
!
9
!
6
!
:
!
;
!
<
Figure 1: An example showing the instability of the esti-
mated phase features φ
i
from measurements of the same ap-
pliance: “an electric saw” from the COOLL dataset. The
x-axis represent different action delays (0 to 19 ms) i.e. dif-
ferent measurements of the same appliance. Between action
delays the mains frequency is likely to slightly vary which
affects the phase estimates.
Note that one of the main desired properties that de-
fine the usefulness for us is the low variability of the
feature value for different measurement instances of
the same appliance. Hence, we will give a special fo-
cus for this property in the sequel.
2.2.1 Pre-analysis of Features for Selection
Phase-related Features. The phase features φ
i
, i =
1, . . . , d specify the position (in radians) of the sinu-
soids cos(2φ f
i
t + φ
i
) at t = 0 with respect to the 2π
time-cycle. These features are subject to an ambigu-
ity in their definition. For example, the solution of
cos(φ
i
) = 0 is φ
i
= 2πm with m being an integer num-
ber. This means that the solution is not unique and
a set of solutions exists. This problem can be alle-
viated by only keeping the solutions that are in the
range [π, π]. Still, we will end up having two so-
lutions to choose from (for example, the solutions of
cos(φ
i
) =
1
2
are
π
3
and
π
3
).
Moreover, when working on real signals, the
slightest nonstationarity encountered in real signals
(especially the mains frequency very small variations
that are usually less than 0.5 % of 50 Hz) seemed
to affect negatively the estimated phase values (Fig-
ure 1).
For all the above-mentioned reasons, we chose not
to use the phase features for the classification.
Amplitude-related Features. Unlike the phase fea-
tures, the amplitude features A
i
, i = 1, . . . , d, repre-
senting the amplitudes of the sinusoids (Eq. (2)), are
much more stable (Figure 2). The estimated valued
also show a variability between the estimated ampli-
tudes of different appliances, even for appliances of
the same type. This suggests that these features are a
0 5 10 15
!"#$%& '()*+ ,-./
0
1
2
3
4
5
6
0*1*-(#(1 2*)3(
4*5 6 %7 8
!
9
!
6
!
:
!
;
!
<
Figure 2: An example showing the stability of the estimated
amplitude features A
i
from measurements of the same ap-
pliance: “an electric saw” from the COOLL dataset. The
x-axis represent different action delays (0 to 19 ms) i.e. dif-
ferent measurements of the same appliance.
0 0.2 0.4 0.6 0.8 1
Time (s)
-1.5
-1
-0.5
0
Amplit ude
g
1
(t)
g
2
(t)
Figure 3: Functions g
1
(t) and g
2
(t).
good candidate for appliance identification instead of
clustering.
In the sequel, these features are kept for use in
appliance classification.
Envelope-related Features. The envelope features
are A
0
and b
j
, j = 1,. . . , n. Whereas we, ideally,
seek time-independent features for appliance classi-
fication, all b
j
depend on the time reference t
0
except
b
n
. To illustrate this, we consider the following expo-
nent of a simulated envelope with n = 2:
g
1
(t) = b
1
t + b
2
t
2
, t [0, 1] s (4)
with b
1
= 0.5 and b
2
= 2 representing the polyno-
mial coefficients. Our time reference is t
0
= 0 s. Sup-
pose now that we shift our time reference from t
0
= 0 s
to 0.3 s (i.e. we define a new time interval starting at
0.3 s). We then obtain the function g
2
(t) = g
1
(t +t
0
)
which is the portion of g
1
(t) on the newly defined in-
terval (Figure 3). Estimating the parameters b
0
1
and b
0
2
of g
2
(t), we find that b
0
1
= 0.7 and b
0
2
= 2. This
corresponds to:
g
2
(t) = = b
1
(t + t
0
) + b
2
(t + t
0
)
2
= (b
1
t
0
+ b
2
t
2
0
) + (b
1
+ 2b
2
t
0
)t + b
2
t
2
= b
0
0
+ b
0
1
t + b
0
2
t
2
.
(5)
Electrical Appliances Identification and Clustering using Novel Turn-on Transient Features
649
The parameter that is not affected by the time refer-
ence shift is b
2
. We show, using a similar reasoning
for n > 2, that the only time reference-independent
parameter is the last coefficient b
n
adapted for use in
appliance classification.
Note also that this time reference shift generates
a new term b
0
0
that we can pull out of the exponent
and multiply by A
0
to get A
0
0
= A
0
e
b
0
0
. Practically, we
always take the time reference as the time where the
current is at its extremum (max or min). Therefore,
the estimated A
0
(or A
0
0
) specifies the highest ampli-
tude the current can reach and is important to keep.
Finally, we select for the appliance classification
A
0
and b
3
along with A
i
, i = 1, . . . , 5.
2.2.2 Feature Selection using a Wrapper
Approach for Identification
Following the pre-analysis sub-section and in order
to prove the soundness of the selected features, we
propose hereafter to use a wrapper-based algorithm
to perform the feature selection task.
The goal is to select the most relevant set of fea-
tures from the set of all available features (in our
case 14) for appliance identification. We propose
to use the wrapper-based sequential forward search
(SFS) algorithm (Kohavi and John, 1997). This lat-
ter, sequentially, adds at each selection step the “rel-
evant” feature that gives the highest possible classifi-
cation rate (CR). This selection procedure was used
in (Hacine-Gharbi et al., 2015) and is summarized in
Algorithm 1.
Algorithm 1: Wrapper-based sequential forward
search (SFS) algorithm.
1. F = {A
0
, b
1
, b
2
, b
3
, A
1
, . . . , A
5
, φ
1
, . . . , φ
5
},
S = {}, I = 14 (initial number of parameters),
j = 1 (iteration index).
2. Evaluate the classification rate CR for each
feature f
l
F .
Select the first feature f
π1
such that:
f
π1
= argmax
f
l
F
(CR ( f
l
)).
F = F { f
π1
}, S = { f
π1
}.
3. j = j + 1.
For each f
l
F , evaluate CR using S { f
l
}.
Select the feature f
π j
such that:
f
π j
= argmax
f
l
F
(CR (S { f
l
})).
F = F { f
π j
}, S = S { f
π j
}.
4. Repeat step 3 until j = I.
5. Give the output S that yields the maximum CR.
The chosen classifier is based on the k nearest
neighbors (k-NN) algorithm. In order to study the
effect of the value of k on the result, we propose to
apply the selection procedure for k = 1, 2, . . . , 10.
The feature selection procedure is conducted us-
ing the COOLL dataset (sub-section 3.1). COOLL
was divided by keeping the first 10 measurement in-
stances from each appliance for the training and the
remaining 10 instances for the test (each appliance
having 20 instances). This yielded 420 measurement
instances for training and 420 measurement instances
for the test.
Table 1 gives the corresponding indexes of the ini-
tial set of features that will be used to show the feature
selection result.
Table 1: Initial set of features and corresponding indexes.
f
l
A
0
b
1
b
2
b
3
A
1
A
2
A
3
index 1 2 3 4 5 6 7
f
l
A
4
A
5
φ
1
φ
2
φ
3
φ
4
φ
5
index 8 9 10 11 12 13 14
The S matrix (Eq. (6)) gives the selection result
where each row k corresponds to a specific k nearest
neighbors choice and each column j corresponds to
the set of selected features, by relevance, at each it-
eration j (Algorithm 1). The matrix elements are the
indexes of the initial set of features (Table 1).
S =
5 6 7 8 9 1 2 4 3 10 11 13 12 14
5 6 7 8 9 1 2 4 3 10 11 13 12 14
5 6 7 8 9 1 2 4 3 10 11 14 12 13
5 6 9 8 7 1 2 3 4 11 10 14 13 12
5 6 7 8 9 1 2 4 3 11 10 13 14 12
5 6 7 8 9 2 1 4 3 11 10 14 13 12
5 6 7 8 9 2 1 4 3 13 10 12 11 14
5 6 7 8 9 1 2 3 4 11 10 13 12 14
5 6 7 8 9 1 2 3 4 11 14 12 10 13
5 6 7 8 9 1 2 3 4 11 10 12 13 14
(6)
In Eq. (6), and for each row, the element that corre-
sponds to the maximum CR is highlighted. The se-
lected features are, then, the set that corresponds to
this element (following Algorithm 1, for a specific
row, each element represents the last added feature
to S). In the first row, for example, the highlighted el-
ement is the fourth element indexed 8. Hence, the se-
lected features are A
1
, A
2
, A
3
and A
4
with correspond-
ing indexes 5, 6, 7, and 8, respectively. To better il-
lustrate this, Figure 4 gives the CR corresponding to
rows 1, 6 and 10 of matrix S.
The selection results (see highlighted elements
of matrix S) indicate that the algorithm selected the
amplitude-related features as the most relevant for the
identification. This is in agreement with the result
of sub-section 2.2.1 where the amplitude-related fea-
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods
650
2 4 6 8 10 12 14
!"#$%"&'( &()#* !
75
80
85
90
95
100
+,%--&!.%"&'( $%"# "# /01
2344
5344
26344
(a) CR variation corresponding to rows k = 1, 6 and 10 (rep-
resenting also the number k of chosen nearest neighbors) of
matrix S (Eq. (6)))
2 3 4 5 6
!"#$%"&'( &()#* !
94
95
96
97
98
99
100
+,%--&!.%"&'( $%"# "# /01
2344
5344
26344
(b) Zoom of Figure 4a. The maximum values of CR for 1-
NN, 6-NN and 10-NN are found, respectively, for j = 4, 5
and 3.
Figure 4: Classification rate CR variation function of itera-
tion index j (see Algorithm 1. j is also the column index of
matrix S (Eq. (6))).
tures were suspected to be good candidates for appli-
ance identification.
Note also that the phases are the least adapted for
appliance identification which, again, is in agreement
with the observations made in sub-section 2.2.1.
3 ELECTRICAL APPLIANCES
IDENTIFICATION AND
CLUSTERING
In this section we give the identification and cluster-
ing results right after giving a brief description of the
used COOLL dataset.
In order to assess the usefulness of the model pa-
rameters for the classification, we do three tests for
both the identification and the clustering. In the first
test we only use the envelope-related features A
0
and
b
3
. In the second one, we use solely the amplitude-
related features A
i
and in the last one we use all seven
features.
3.1 COOLL Dataset Description
Controlled On/Off Loads Library (COOLL) is a
dataset of high-sampled electrical current and voltage
measurements (840 current measurements and 840
voltage measurements) representing individual appli-
ances consumption. The measurements were taken in
June 2016 in the PRISME laboratory of the Univer-
sity of Orl
´
eans, France. The appliances are mainly
controllable appliances (i.e. we can precisely con-
trol their turn-on/off time instants). 42 appliances of
12 types were measured at a 100 kHz sampling fre-
quency (Table 2). A more detailed description of this
dataset and its specificities can be found in (Picon
et al., 2016).
Table 2: COOLL dataset summary. Source: (Picon et al.,
2016).
N
Appliance type
# of appli-
ances
# of current
signals (20
per appliance)
1 Drill
6 120
2 Fan
2 40
3 Grinder
2 40
4 Hair dryer
4 80
5 Hedge trimmer
3 60
6 Lamp
4 80
7 Paint stripper
1 20
8 Planer
1 20
9 Router
1 20
10 Sander
3 60
11 Saw
8 160
12 Vacuum cleaner
7 140
Total
42 840
Note that we will use the current measurements
for the classification. The voltage measurements
(sampled at 100 kHz) presenting low variability from
an appliance to another are, then, less adapted for
classification and are discarded in this study.
The measurements of COOLL are 6 seconds long
with a pre-turn-on duration of 0.5 second duration and
post-turn-off duration of 1 second. Each appliance
has 20 measurement instances. Each instance corre-
sponds to specific action delay (a turn-on delay wrt
the mains voltage time-cycle) ranging from 0 to 19 ms
with a step of 1 ms (Picon et al., 2016).
3.2 Identification
For the appliance identification task presented here-
after, we use the supervised algorithm k-NN. This al-
gorithm allows the prediction of the class of a new
Electrical Appliances Identification and Clustering using Novel Turn-on Transient Features
651
97
0
0
0
4
0
0
0
6
0
9
0
4
39
0
0
0
0
0
0
0
0
0
0
0
0
24
0
5
0
0
0
0
0
0
0
0
1
0
71
0
0
0
0
0
0
14
0
3
0
3
1
33
0
0
0
0
0
4
0
0
0
0
1
0
76
0
0
0
0
2
2
0
0
0
3
0
1
20
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
1
0
0
0
2
0
0
0
8
0
3
0
0
0
0
0
0
0
0
0
0
58
1
3
15
0
13
4
16
1
0
0
6
0
117
1
0
0
0
0
0
2
0
0
0
2
10
134
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
!"#$%&'#$ &()**#*
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
+",# &()**#*
0
20
40
60
80
100
120
Figure 5: Confusion matrix for identification using A
0
and
b
3
. Classification rate CR = 82.98%.
point (in the feature space representing an appliance)
by computing relative distances between this latter
and points in its neighborhood. The appliance class is,
then, decided with a majority vote between the classes
of the k nearest neighbors.
The parameters to be fixed are the number k of
nearest neighbors to consider and the distance metric.
For our tests, we chose the Euclidean distance and k =
10. A low k value (< 5) may degrade the robustness
of the classifier by increasing the risk of classifying
using isolated points (the extreme case being k =1).
Since we already know, for our dataset, that a well-
formed group should contain 20 points (the 20 mea-
surement instances of the same appliance), we chose
to compare each new point with half of the number
of points we are supposed to find in its neighborhood.
Hence the k = 10.
Since our dataset is not big enough to have a lot of
measurement instances of all appliances in both the
training and the test datasets, and in order to get more
reliable test results, we chose to evaluate the iden-
tification performance using K-fold cross-validation
with K = 10. The K-fold cross-validation consists
of dividing the dataset, randomly, into K sets, doing
K tests and averaging the results. For each test, we
choose one different set from all the K sets for the test
keeping the others for the training. We repeat this K
times and we average the obtained results.
Figure 5 gives the confusion matrix for the identi-
fication using the parameters A
0
and b
3
. With a classi-
fication rate (CR) of 82.98%, the identification gives
several bad results which indicates that these features
are not the most adapted for appliance identification.
Figure 6 gives the confusion matrix for the identi-
fication only using the features A
i
. We note the (very)
good CR value of 98.57%. This suggests that these
features are more adapted to appliance identification
than the envelope-related features.
Figure 7, on the other hand, shows the obtained
120
0
0
0
6
0
0
0
0
0
0
0
0
39
0
0
0
0
0
0
0
0
0
0
0
0
40
0
0
0
0
0
0
0
0
0
0
0
0
80
0
0
0
0
0
0
0
0
0
0
0
0
54
0
0
0
0
0
0
0
0
1
0
0
0
80
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
55
0
0
0
0
0
0
0
0
0
0
0
5
160
0
0
0
0
0
0
0
0
0
0
0
0
140
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
!"#$%&'#$ &()**#*
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
+",# &()**#*
0
20
40
60
Figure 6: Confusion matrix for identification using A
i
.
Classification rate CR = 98.57%.
103
0
0
0
4
0
0
0
6
0
4
0
0
39
0
0
0
1
0
0
0
0
0
0
0
0
31
0
4
0
0
0
0
0
0
0
0
0
0
77
0
0
0
0
0
0
0
0
1
0
5
1
32
0
0
0
0
0
5
0
0
1
0
2
0
79
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
2
0
0
0
13
0
0
0
0
0
0
0
0
0
0
0
0
60
1
0
16
0
4
0
18
0
0
0
1
0
150
0
0
0
0
0
0
0
0
0
0
0
0
140
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
!"#$%&'#$ &()**#*
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
+",# &()**#*
0
50
100
150
Figure 7: Confusion matrix for identification using A
0
, b
3
and A
i
. Classification rate CR = 90.95%.
identification confusion matrix obtained after using
all the model parameters. With a CR of 90.95%,
the use of all the features enhances the result of the
envelope-related features but also deteriorates the re-
sult of the amplitude-related features.
As a conclusion, we can say that the identification
results confirm the adaptability of the features A
i
for
the identification task and the nonadaptability of A
0
and b
3
for this task.
3.3 Clustering
For the clustering, we use one of the most known un-
supervised algorithms i.e. the k-means. It requires the
user to specify a priori the number of clusters needed
to be formed. We chose k = 12 clusters based on
the number of device types we have in the COOLL
dataset.
Figure 8 shows the results as a confusion matrix
using A
0
and b
3
. The algorithm formed, especially,
one big cluster (cluster 1) and different smaller clus-
ters. The smaller clusters contain only lamps whereas
the big cluster contains mostly motor-driven appli-
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods
652
Figure 8: Confusion matrix for clustering using A
0
and b
3
.
ances. This result is very interesting. The appliances
of the COOLL dataset are mostly motor-driven ap-
pliances (i.e. appliances with a motor that is respon-
sible for the main task the appliance is supposed to
perform) that were gathered inside that big cluster.
This indicates that the envelope-related features are
suitable for distinguishing appliances having different
working principles.
Lamps have a different working principle than the
motor-driven loads. One of the reason that may ex-
plain the scatter of the rest of the appliances (not
motor-driven) in different clusters is that the lamps are
of different types that have different working princi-
ples. Actually, the four lamps of COOLL are, respec-
tively, a 1.6 W light emitting diode (LED), a 15 W
compact fluorescent lamp (CFL), a 105 W halogen
lamp (HL) and 100 W halogen lamp (HL).
The tests we did show that the feature b
3
is related
to the envelope amplitude decrease rate. Higher (neg-
ative) values indicate faster amplitude decrease.
Note also (Figures 8) that 46 lamps were grouped
inside the big cluster. These lamps are seemingly
lamps with no transient (Figure 9) (the current goes
almost directly from zero to the steady-state and no
transition is observed; they are most likely LED and
CFL lamps) and, hence, are different from lamps with
high amplitude variation transients (halogen lamps).
Figure 10 shows the clustering result using the A
i
features. Clearly, these features are not adapted for
the clustering since no link between the found clus-
ters and the true classes is clear and apparently no de-
terministic pattern seems visible to distinguish well
defined clusters.
The result of Figure 11 shows that the use of all
available features still allows us to retrieve the motor-
driven cluster even with the use of the A
i
features in
contrast to what happened with identification.
0 1 2 3 4 5 6
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
Time (s)
Amplitude (A)
Lamp1:0ms
Figure 9: Turn-on transient current of a light emitting diode
(LED). The interval [0, 1] s represents the pre-turn-on pe-
riod whereas the interval [5, 6] s represents the post-turn-off
period.
60
0
0
0
40
0
0
0
0
20
100
0
0
0
0
40
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
20
0
0
0
0
0
0
0
20
20
0
0
20
0
20
0
0
20
20
0
0
0
0
0
0
20
0
0
20
0
0
0
0
0
20
40
0
0
0
60
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
60
0
0
0
0
0
0
0
0
0
0
0
20
20
0
0
0
0
20
0
0
0
40
20
0
20
0
0
0
0
0
0
0
0
0
20
20
1
2
3
4
5
6
7
8
9
10
11
12
!"#$% &'#()*+(
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
,+#* &'-((*(
0
10
20
30
40
50
60
70
80
90
100
Figure 10: Confusion matrix for clustering using A
i
.
120
40
40
80
60
46
20
20
20
60
160
140
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
13
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
2
3
4
5
6
7
8
9
10
11
12
!"#$% &'#()*+(
Drill
Fan
Grinder
Hair_drayer
Hedge_trimmer
Lamp
Paint_stripper
Planer
Router
Sander
Saw
Vacuum_cleaner
,+#* &'-((*(
0
20
40
60
80
100
120
140
160
Figure 11: Confusion matrix for clustering using A
0
, b
3
and
A
i
.
4 CONCLUSIONS
In this paper, novel features extracted from a recently
proposed mathematical model for modeling the turn-
on transient current were presented and used in order
to classify electrical appliances. These features were
analyzed for the sake of selecting a set of features
that is relevant for appliance classification. From a
Electrical Appliances Identification and Clustering using Novel Turn-on Transient Features
653
set of fourteen features seven were selected. A se-
quential forward search (SFS) wrapper-based selec-
tion algorithm was also used and its results validated
the soundness of the previously selected features.
The Controlled On/Off Loads Library (COOLL)
was used for the classification. A comparison be-
tween the appliance identification and clustering re-
sults using the turn-on transient features was con-
ducted. The results indicate that the amplitude-based
features A
i
, i = 1, . . . , 5 are the most relevant for appli-
ance identification whereas the envelope-based fea-
tures A
0
and b
3
are the most relevant for appliance
clustering.
Future work may investigate further the robust-
ness of the obtained results by testing the classifica-
tion on other datasets with bigger sizes than COOLL
and containing other families of appliance types (TV,
washing machines, refrigerator, etc.). Other problems
like model selection (parameters d and n) for the turn-
on transient current model may also be addressed.
ACKNOWLEDGEMENTS
This study was supported by the R
´
egion Centre-Val
de Loire (France) as part of the project MDE–MAC3
(Contract n
2012 00073640).
REFERENCES
Barsim, K. S. and Yang, B. (2015). Toward a semi-
supervised non-intrusive load monitoring system for
event-based energy disaggregation. In 2015 IEEE
Global Conference on Signal and Information Pro-
cessing (GlobalSIP), pages 58–62.
Bonfigli, R., Squartini, S., Fagiani, M., and Piazza, F.
(2015). Unsupervised algorithms for non-intrusive
load monitoring: An up-to-date overview. In Environ-
ment and Electrical Engineering (EEEIC), 2015 IEEE
15th International Conference on, pages 1175–1180.
Coleman, T. F. and Li, Y. (1994). On the convergence of
interior-reflective newton methods for nonlinear min-
imization subject to bounds. Mathematical Program-
ming, 67(1):189–224.
Darby, S. (2010). Smart metering: what potential for house-
holder engagement? Building Research & Informa-
tion, 38(5):442–457.
Fischer, C. (2008). Feedback on household electricity con-
sumption: a tool for saving energy? Energy efficiency,
1(1):79–104.
Gillis, J. M. and Morsi, W. G. (2016). Non-intrusive load
monitoring using semi-supervised machine learning
and wavelet design. IEEE Transactions on Smart
Grid, PP(99):1–8.
Goncalves, H., Ocneanu, A., Berges, M., and Fan, R.
(2011). Unsupervised disaggregation of appliances
using aggregated consumption data. In The 1st KDD
Workshop on Data Mining Applications in Sustain-
ability (SustKDD).
Hacine-Gharbi, A., Petit, M., Ravier, P., and N
´
emo, F.
(2015). Prosody based automatic classification of the
uses of french ouias convinced or unconvinced uses.
In International Conference on Pattern Recognition
Applications and Methods (ICPRAM), number ISBN
978-989-758-077-2, pages 349–354.
Hancke, G. P., Hancke Jr, G. P., et al. (2012). The role of
advanced sensing in smart cities. Sensors, 13(1):393–
425.
Hart, G. W. (1985). Prototype nonintrusive appliance load
monitor. In MIT Energy Laboratory Technical Report,
and Electric Power Research Institute Technical Re-
port.
Hart, G. W. (1989). Residential energy monitoring and
computerized surveillance via utility power flows.
Technology and Society Magazine, IEEE, 8(2):12–16.
Hart, G. W. (1992). Nonintrusive appliance load monitor-
ing. Proceedings of the IEEE, 80(12):1870–1891.
Kohavi, R. and John, G. H. (1997). Wrappers for feature
subset selection. Artificial intelligence, 97(1):273–
324.
Nait Meziane, M., Picon, T., Ravier, P., Lamarque, G.,
Le Bunetel, J.-C., and Raingeaud, Y. (2016). A
measurement system for creating datasets of on/off-
controlled electrical loads. In Conference on Envi-
ronment and Electrical Engineering (EEEIC), Pro-
ceedings of the 16th IEEE International, pages 2579–
2583.
Nait Meziane, M., Ravier, P., Lamarque, G., Abed-Meraim,
K., Le Bunetel, J.-C., and Raingeaud, Y. (2015). Mod-
eling and estimation of transient current signals. In
Signal Processing Conference (EUSIPCO), 2015 Pro-
ceedings of the 23rd European, pages 2005–2009.
Parson, O. (2016). Overview of the nilm field.
http://blog.oliverparson.co.uk/2015/03/overview-
of-nilm-field.html.
Picon, T., Nait Meziane, M., Ravier, P., Lamarque, G., Nov-
ello, C., Le Bunetel, J.-C., and Raingeaud, Y. (2016).
COOLL: Controlled on/off loads library, a public
dataset of high-sampled electrical signals for appli-
ance identification. arXiv preprint arXiv:1611.05803.
Zeifman, M. and Roth, K. (2011). Nonintrusive appli-
ance load monitoring: Review and outlook. Consumer
Electronics, IEEE Transactions on, 57(1):76–84.
Zoha, A., Gluhak, A., Imran, M. A., and Rajasegarar, S.
(2012). Non-intrusive load monitoring approaches
for disaggregated energy sensing: a survey. Sensors,
12(12):16838–16866.
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods
654