Loads Estimation using Deep Learning Techniques in Consumer

Washing Machines

Alexander Babichev

, Vittorio Casagrande

, Luca Della Schiava

, Gianfranco Fenu

, Imola Fodor

Enrico Marson

, Felice Andrea Pellegrino

, Gilberto Pin

, Erica Salvato

, Michele Toppano

and Davide Zorzenon

Department of Engineering and Architecture, University of Trieste, Italy

Electrolux Italia S.p.A., Porcia, 33080, Italy

Technische Universit

at Berlin, Control Systems Group, Einsteinufer 17, D-10587 Berlin, Germany

Department of Electrical and Electronic Engineering, University College London, U.K.

{alexander.babichev, luca.della-schiava, imola.fodor, enrico.marson, gilberto.pin, michele.toppano}@electrolux.com

Keywords:

Long Short Term Memories, One-dimensional Convolutional Neural Networks, Virtual Sensing.

Abstract:

Home appliances are nowadays present in every house. In order to ensure a suitable level of maintenance,

manufacturers strive to design a method to estimate the wear of the single electrical parts composing an

appliance without providing it with a large number of expensive sensors. With this in mind, our goal consists

in inferring the status of the electrical actuators of a washing machine, given the measures of electrical signals

at the plug, which carry an aggregate information. The approach is end-to-end, i.e. it does not require any

feature extraction and thus it can be easily generalized to other appliances. Two different techniques have been

investigated: Convolutional Neural Networks and Long Short-Term Memories. These tools are trained and

tested on data collected on four different washing machines.

1 INTRODUCTION

Nowadays each house is provided with many differ-

ent appliances of common use. The quality of such

machines is based not only on their efﬁcacy and ef-

ﬁciency, but also on their reliability. Hence the ca-

pability to ensure an adequate level of maintenance

is something that companies strive for. In particular,

predictive maintenance aims to schedule the replace-

ment of components before their actual break down.

One way to monitor an appliance is providing it with

many sensors that report wear. Even though this

method can be effective, it has the drawback of being

expensive. Hence, another monitoring method should

be found. Electrical signals drawn from the grid are

rather easy physical quantities to measure in electri-

cal appliances, for instance employing a metering de-

vice with computational capabilities able to host e-AI

applications. Finding a reliable method to use such

information to infer the usage of an appliance is of

particular interest. In the present paper, a method

based on deep learning is proposed to estimate the

status of some loads inside a consumer washing ma-

chine. Here, by “load” we mean any appliance’s in-

ternal component (for instance, the heater) that em-

ploys electrical energy. To the best of our knowledge,

such techniques have never been used for this speciﬁc

application. However, deep learning techniques have

been employed for similar tasks.

In (Susto et al., 2018) machine learning tools are

used to estimate the weight of clothes inside a wash-

ing machine. The estimated weight is then used by

the washing machine to improve the washing pro-

cess. The approach is based on tools such as ran-

dom forest and logistic regression, which require

handcrafted features. Estimation and machine learn-

ing techniques have been extensively used for Non-

Intrusive Load Monitoring (NILM), that is, estima-

tion of usage of home appliances taking as input elec-

trical signals of the main panel. Different solutions

have been proposed in literature, usage of decision

trees (Maitre et al., 2015), harmonic analysis (Djord-

jevic and Simic, 2018), empirical mode decompo-

sition (EMD) principle (Huang et al., 2019) and k-

Nearest Neighbor classiﬁers (Alasalmi et al., 2012).

All of these methods require a feature design process

Babichev, A., Casagrande, V., Della Schiava, L., Fenu, G., Fodor, I., Marson, E., Pellegrino, F., Pin, G., Salvato, E., Toppano, M. and Zorzenon, D.

Loads Estimation using Deep Learning Techniques in Consumer Washing Machines.

DOI: 10.5220/0008935104250432

In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2020), pages 425-432

ISBN: 978-989-758-397-1; ISSN: 2184-4313

425

based on prior knowledge of the functioning of the ap-

pliance. An effective method to overcome this prob-

lem is proposed in (Mocanu et al., 2016) and in (Kim

et al., 2017), where the employment of deep learning

tools is proposed which do not require any feature de-

sign. The aim is to detect the energy consumption of

each appliance of an household, from aggregate mea-

surements of voltage and/or current in the distribution

system.

In the present paper, we employ measurements

of electrical signals of a single appliance, and we

face the problem of load estimation, that encompasses

both regression and classiﬁcation, in a supervised

learning fashion. In particular, a dataset of signals

acquired during 502 washing cycles, corresponding

to a total of 1002.4 hours of operation has been col-

lected, along with the actual loads status. To deal with

the estimation problem, the deep learning tools which

have been employed are Long Short-Term Memories

(LSTM) and Convolutional Neural Networks (CNNs)

(Goodfellow et al., 2016). LSTM deal natively with

sequential data. As for the CNNs, we employ one

dimensional CNNs, that recently have been proven

effective in time series classiﬁcation (Hannun et al.,

2019). The remainder of the paper is organized as

follows. Section 2 describes the dataset and the

data preparation. Section 3 speciﬁes the regression

and classiﬁcation problems and establishes the per-

formance indices. The adopted solution is described

in detail in Section 4 and the experimental results are

reported in Section 5. Conclusions are drawn in Sec-

tion 6.

2 DATASET DESCRIPTION

The required dataset for training and testing the net-

work has to contain enough information to allow the

network generalization capability. Hence, to collect

data, a measurement campaign has been performed on

different washing machines and washing cycles. A to-

tal number of 502 sequences have been collected from

real appliances trying to cover the largest number of

operation settings, 402 of which to be used for train-

ing and the remaining 100 for testing. In order to en-

sure that training and testing sets contained a uniform

level of information, the dataset splitting has been

done once by manually labelling the sequences as

training or testing sequence. Each recorded sequence

contains measured values of the following electrical

quantities:

• real and imaginary parts of I, III, V current har-

monics;

• real and imaginary parts of I, III, V voltage har-

monics;

• cumulative energy drawn from the grid;

and of the following loads:

• drum speed (absolute value in rpm);

• heater (boolean);

• drain pump (boolean);

• electrovalves (boolean).

The aim of this work is to provide a punctual es-

timation of the drum speed and of the boolean (ac-

tivation) status of the last three output variables (0 =

OFF, 1 = ON). Considering that the loads are powered

by electricity, we expect a causal effect (which can be

non-linear and not easy to estimate without adequate

knowledge of the appliance) of the variations of the

state of the loads on the variations of the electrical

quantities. Hence a binary classiﬁer has been trained

for each one of the three boolean loads, whereas the

estimation of the drum speed has been treated as a re-

gression problem. The main issue encountered with

this dataset is that the loads are turned OFF during

most of the time of each recorded cycle. In table 1 the

percentage of ON and OFF samples for each load of

the dataset is shown.

Table 1: Percentage of ON and OFF samples for each load.

OFF samples ON samples

Heater 86% 14%

Drain Pump 87% 13%

Electrovalves 98% 2%

Class unbalance has a detrimental effect in train-

ing (Buda et al., 2018; Grangier et al., 2009). In sec-

tion 4 the employed methods to deal with class unbal-

anced are presented.

2.1 Data Preparation

Recorded sequences include heterogeneous physical

quantities which may have different orders of mag-

nitude. Hence measurements have been normalized

using z-score standardization, i.e., each input sample

of channel i (x

), is normalized using the following

formula:

− µ

where µ

and σ

are the sample mean and standard

deviation of the training sequences of the considered

physical quantity (e.g. real part of the ﬁrst current

harmonic, etc.) and are referred to as normalization

factors.

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

426

In order to make use of CNNs for load classiﬁ-

cation, the dataset requires to be elaborated with an

operation which will be referred to as segmentation

in this paper. This is due to the fact that Convolu-

tional Neural Networks (in contrast to LSTMs) re-

quire to be fed with ﬁxed size data. For this reason

they have been extensively exploited for image clas-

siﬁcation (Krizhevsky et al., 2012), where each ob-

servation has a ﬁxed length, height and number of

channels (colors). On the contrary, in the considered

scenario each recorded sequence has a different du-

ration. Moreover, a punctual load status detection is

required for each sample of the sequence (not for the

whole sequence). Therefore, in order to meet the in-

put data constraints required by CNNs, the dataset has

been segmented. The segmentation operation requires

to specify a window size (number of samples per seg-

ment) and a stride (number of samples which separate

the ﬁrst sample of two consecutive segments). Hence,

in case the stride is lower than the window size, there

is an overlapping between consecutive segments. The

resulting observation is then a ﬁxed size “image” of

length equal to the window size, with one channel for

each input feature and unitary height; this approach is

the same employed in (Hannun et al., 2019). Finally,

a single output label is assigned to each observation

and the network is trained to associate the right la-

bel to each observation. There are various options to

set the segment label, however, since the segmented

dataset will be used only for CNNs (that ﬁnds it eas-

ier to classify elements in the centre of an image), the

label is set as the load status in the middle of the con-

sidered segment. From an implementation point of

view, the segmentation requires to temporarily store

the values of the input electrical quantities inside each

segment before computing the prediction of the load

status. Precisely, N

× W S values must be stored in

memory, where N

is the number of inputs (13 in our

case) and W S is the segments window size. There-

fore, to achieve online loads estimation, W S must be

chosen large enough to capture sufﬁcient information

for the estimation, but adequately small to avoid ex-

cessive memory requirements.

Dataset segmentation allows the CNN employ-

ment as well as class balancing. The class unbal-

ance is a typical issue in classiﬁcation problems and

several strategies have been proposed in literature to

overcome it (Batista et al., 2004; Buda et al., 2018).

Here, we explore two different possible solutions in

order to balance the classes, i.e. to obtain equal pro-

portion of classes ON and OFF:

• oversampling: randomly copy segments belong-

ing to the less common class. This random

information-duplication procedure may lead to a

huge increase of dataset size and, in addition, to

overﬁtting (Batista et al., 2004).

• undersampling: randomly delete segments corre-

spondent to the most common class, thus reduc-

ing the dataset size at the cost of deleting relevant

information for the nets training process (Buda

et al., 2018).

3 PROBLEM STATEMENT AND

PERFORMANCE INDICES

3.1 Regression of the Drum Speed

The drum speed estimation is the ﬁrst problem that

has been faced. A trained network has to provide an

accurate estimation of this value at each time instant,

given the electrical input signals. Since the drum

speed is a continuous value, it is natural to formulate

a regression problem.

Different networks have been trained using different

hyperparameters settings and each one of them has

been tested on the test set. In order to easily com-

pare the performance of these different estimators, it

is useful to dispose of a single performance index for

each one of them. The performance index that has

been chosen for the regression problem is the Root

Mean Square Error (RMSE):

RMSE =

∑

n=1

( ˆy

− y

)

where:

• N is the number of samples;

• ˆy

is the predicted output of the n-th sample;

• y

is the true output of the n-th sample.

Thus, ﬁrstly, the test sequences are stacked into

one and, secondly, a single RMSE value is calculated

for the whole test set (hence, N turns out to be the

sum of the length of all the test sequences). Then, the

network resulting in the lower RMSE is the one with

best performance.

3.2 Load Status Detection

The loads ON/OFF status estimation has been treated

as a classiﬁcation problem, due to the boolean nature

of the output variables considered. Given the mea-

sured electrical signals (voltage harmonics, current

harmonics and energy), the trained network should be

able to detect whether a load is turned ON or OFF.

In particular, for each load a single binary classiﬁer

Loads Estimation using Deep Learning Techniques in Consumer Washing Machines

427

is trained. Again, a suitable performance index is

required to assess the trained networks. The most

employed indices to evaluate a classiﬁer are: Accu-

racy, Precision and Recall; they are well described in

(Goodfellow et al., 2016). Such indices are expressed

in percentage and allow to numerically assess the per-

formance of the network. Due to class unbalance of

this application, Accuracy can lead to misleading in-

formation when used to evaluate the network perfor-

mance (Jeni et al., 2013). Hence, in order to sum-

marize the performance of a network the F1 score is

used:

F1 = 2 ×

Precision × Recall

Precision + Recall

For each classiﬁer two F1 score values are computed:

one for the ON class and one for the OFF class. The

network resulting in both values of F1 score closer to

100% is the one with best performance.

4 PROPOSED SOLUTION

In this section the architecture of each developed net-

work is outlined together with the employed training

methods.

4.1 Regression of the Drum Speed

As it was explained in the previous section, since

the value of drum speed is not boolean, it is nat-

ural to formulate this as a regression problem. A

suitable deep learning tool which can be used for

time sequence modeling is Recurrent Neural Net-

works (RNNs), in particular Long Short-Term Mem-

ory (LSTM) networks (Hochreiter and Schmidhuber,

1997). An LSTM unit is composed of four different

gates (named cell candidate, input gate, output gate

and forget gate) which regulate the information ﬂow

through the unit. This particular architecture allows

to learn arbitrarily long-term dependencies in time se-

ries. For this reason this kind of unit is used to solve

the regression problem. The layers of the proposed

LSTM network are:

- sequence input layer: required to input the time

sequences to the ensuing layer;

- LSTM layer: learns long term dependencies in

time series (Figure 1);

- fully connected layer: maps the output of the pre-

vious layer to the output of the net;

- regression layer: computes the loss function re-

quired for the back propagation process.

The sequence input layer has 13 input channels: 6

for current harmonics, 6 for voltage harmonics and 1

for energy. Each gate of the LSTM layer applies an

element-wise nonlinearity to an afﬁne transformation

of inputs and outputs values of the layer. The opera-

tion performed by each gate is:

= σ(W x

+ Rh

t−1

+ b)

where:

• y

is the output of a LSTM gate (e.g. the forget

gate) at time t;

• x

is the input to the LSTM layer at time t (e.g.

values of voltage harmonics at time t);

• h

is the output of the LSTM layer at time t;

• W , R and b are, respectively, input weights, recur-

rent weights and biases speciﬁc of an LSTM gate

(learnable parameters);

• σ is the nonlinear function that controls the infor-

mation ﬂow through the gate. In this work the

sigmoid function has been used for the input, for-

get and output gates, while for the output of the

cell candidate the hyperbolic tangent function has

been chosen.

The ﬂow of information through the LSTM cell is

controlled by the system of gating units. Furthermore,

the output and the state of the LSTM layer are calcu-

lated as:

= f

 c

t−1

+ i

 g

= o

 σ

)

where:

• f

is the output of the forget gate;

• c

is the LSTM state;

• i

is the output of the input gate;

• g

is the output of the cell candidate;

• o

is the output of the output gate;

•  is the Hadamard product;

• σ

is the hyperbolic tangent (state activation func-

tion).

σ σ

× ×

t−1

Figure 1: LSTM layer.

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

428

The fully connected layer maps the output of the

LSTM layer to the output of the net through the fol-

lowing operation:

= V h

+ c

where:

• z

is the output of the fully connected layer

• V and c are the weights and biases of the fully

connected layer (the learnable parameters)

The last layer of the network computes the loss func-

tion for each considered training sequence as:

L =

∑

i=1

( ˆy

− y

)

where:

• S is the sequence length;

• ˆy

is the target output;

• y

is the predicted output.

Then the total loss is the mean loss over the observa-

tions of the mini-batch.

4.2 Classiﬁcation of Loads

The problem of classifying loads given the measured

electrical signals has been faced using two different

networks: LSTMs and CNNs. The LSTM based net-

work used for classiﬁcation has a similar structure to

the previous one:

- sequence input layer

- LSTM layer

- fully connected layer

- softmax layer

- weighted classiﬁcation layer.

In particular, the regression layer is replaced by a soft-

max layer followed by a weighted classiﬁcation layer.

The softmax layer is required to normalize its input

into a probability distribution of K probabilities (one

for each class, in our case K = 2). To do that, the

softmax function is applied to the output of the fully

connected layer. The output probability of class i is

calculated as:

∑

k=1

where:

• K is the total number of classes

• x is the layer input.

The weighted classiﬁcation layer is chosen in order to

deal with the class unbalance problem. It computes

the weighted cross entropy at each step as follows:

L = −

K−1

∑

i=0

log(Y

)

where:

• K is the number of classes (K = 2 for binary clas-

siﬁcation);

• w

is the weight correspondent to class i;

• T

is the target output of each class (1 for the cor-

rect class and 0 for all the others);

• Y

is the output of the net.

Then, the loss function is the mean loss over all the

observations of the mini-batch. Because LSTMs do

not require the data segmentation, here the data bal-

ance methods described in 2.1 cannot be used. How-

ever, the weights of the weighted cross entropy can be

tuned such that the penalty associated with a wrong

estimation of the least frequent class is greater than

the one associated with a wrong estimation of the

most frequent class.

The second approach that has been used to esti-

mate binary loads is CNN networks. In contrast to

LSTMs, as previously described in 2.1, CNNs require

to take as input sequences of the same length, thus the

dataset needs to be segmented. Moreover, in order to

cope with class unbalance, oversampling is performed

on the training set. The chosen architecture for the

CNN is:

- image input layer

- hidden layer 1

- max pooling layer 1

- hidden layer 2

- max pooling layer 2

- . . .

- hidden layer N

- fully connected layer

- softmax layer

- classiﬁcation layer,

where N

is the number of hidden layers each of

which is composed of:

- convolutional layer

- batch normalization layer

- ReLU layer.

Loads Estimation using Deep Learning Techniques in Consumer Washing Machines

429

The image input layer deﬁnes the dimension of the

input observations; in our case, every observation is

treated as an image of unitary height, length corre-

sponding to the segments window size and number of

channels equal to the number of input features, i.e. 13.

Each one of the N

hidden layers presents the same

structure. First of all, the convolutional layer, consist-

ing in a certain number of linear ﬁlters, elaborates the

output of the previous layer. Parameters such as num-

ber of ﬁlters, ﬁlters length, stride and padding must be

properly tuned in order to achieve good results in clas-

siﬁcation problems; this will be discussed in the fol-

lowing section. Then, the batch normalization layer

normalizes the output of the previous layer, and it is

commonly used between a convolutional layer and a

nonlinear operation (such as the one performed by the

ReLU layer) to speed up the training of CNNs. The

ReLU layer applies the rectiﬁer activation function

ReLU(·) to each element x of the output of the pre-

vious layer, as

ReLU(x) = max(x, 0).

The rectiﬁer function has been recently widely

adopted as activation function for hidden layers (Glo-

rot et al., 2011; Krizhevsky et al., 2012). Finally, the

max pooling layer is used to reduce the complexity

of the data ﬂowing between the layers. In particular,

it divides the input values in regions and performs a

maximum operation among the values in each region.

The fully connected layer, the softmax layer and the

classiﬁcation layer are similar to those presented for

the LSTMs.

4.3 Hyperparameters Setting

In this section, the parameters used for deﬁning the

structure of the machine learning tools will be pre-

sented, and brieﬂy commented. To tune some hyper-

parameters it was necessary to carry out trial training.

For the sake of brevity in this section only results re-

garding electrovalves are reported (which are the most

difﬁcult load to estimate). For the parameters choice,

two opposite goals were taken in consideration:

1. achieve high performance,

2. avoid excessive memory requirements.

In real world applications, the second point is cru-

cial – high memory demand would result in expensive

hardware. Considering that the aim is to monitor the

appliance while reducing costs associated to hardware

sensors, this would frustrate our efforts.

Regarding LSTMs the main hyperparameter that

requires to be tuned is the number of hidden units

(or state dimension). In order to optimally choose

this hyperpartameter, a Bayesian optimisation prob-

lem has been solved. The objective function is set

to the RMSE (for regression problem) and to the F1

score (for classiﬁcation problem), respectively to be

minimised and maximised. In order to avoid net-

works with too many hidden units (which would lead

to overﬁt the training data and high memory demand)

the state dimension has been chosen ranging from 10

to 50 hidden units. The outcome of the optimisation

problem has been the same in the classiﬁcation and

regression case:

= 40.

In addition to the state dimension, the weights as-

sociated with each class (to compute the loss in the

weighted classiﬁcation layer) need to be set. Each

class weight has been set according to the relative

class occurrence (refer to Table 1). For example, in

the electrovalves case, since the relative occurrence

of the two classes is 98% for class OFF and 2% for

class ON, weights are:

= 0.98

OFF

= 0.02.

Regarding CNNs, the following parameters were

tuned after practical experiments: segments window

size, segments stride, segments labelling method,

number of hidden layers, number and size of convo-

lutional ﬁlters.

An increase of window size leads to more infor-

mation collected for the output estimation, but higher

memory demand. In table 2 the CNN performances

corresponding to different choices of the window size

are shown. A number of 50 samples for each segment,

which leads to the best results, was considered small

enough to be saved in an enough cheap hardware.

Table 2: Results obtained using CNNs with different win-

dow size.

Window Size F1 class ON F1 class OFF

50 97.65% 99.82%

40 48.06% 18.33%

20 74.66% 97.41%

While the window size must be the same during

training and testing, the stride can be chosen differ-

ently. On one hand, in test it is necessary to set the

stride equal to 1 to get a load prediction for each time

step; in this way we are also able to compare the per-

formance of LSTMs and CNNs. On the other hand, in

training this is not required; however, setting a higher

value of stride would result in a loss of information.

Therefore, the stride was kept unitary during training

too.

The network structure must be reset every time the

window size is changed: as follows we will refer to

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

430

the case of a 50-samples large window size large. Be-

ing the size of the input images small, a shallow net

was sufﬁcient to achieve good results. Precisely, the

number of hidden layers was set to 3. The number of

ﬁlters for each hidden layer was ﬁxed using a formula

similar to (Hannun et al., 2019) to be a multiple of the

layer position, i.e.,

(i) = 2

i+α

where N

(i) is the number of ﬁlters of the i-th hidden

layer, for each i ∈ {1, 2, 3}, and α ∈ N is a constant.

Therefore, by only choosing α, the number of ﬁlters

for each layer is obtained. In table 3, some results

obtained varying α are collected. Considering both

of the project objectives, a value of α = 2 was set.

Moreover, it is clear from table 3 that an α increased

from 2 to 3 does not improve network performance.

Table 3: Results obtained using CNNs with a different num-

ber of ﬁlters.

α F1 class ON F1 class OFF

1 89.87% 99.19%

2 97.65% 99.82%

3 97.68% 99.82%

Regarding the size of the convolutional ﬁlters, it

was set after choosing the labelling method, unitary

stride and zero padding, such that the information

contained in the central sample could ﬂow through

each layer. This value chosen set empirically for each

ﬁlter after an extensive testing phase. In particular, the

lengths of the ﬁlters in the three hidden layers have

been set respectively to L

= 5, L

= 4, L

= 3.

5 EXPERIMENTAL RESULTS

After the training, each network is tested on the test

set. In this section signiﬁcant results are reported to-

gether with a brief discussion. Results reported in

this section are obtained using the hyperparameters

set as it was explained in the previous section. In ad-

dition, a ﬁne tuning of the optimizer settings (in terms

of mini-batch size, learning rate, etc) has been done.

The following results about LSTM are obtained using

Adam optimizer with an initial learning rate of 0.006,

a learning rate drop factor of 0.8, a learning rate drop

period of 20 epochs, a total number of 100 epochs for

training and a mini-batch size of 64. Regarding CNNs

Adam optimizer has been used with the following pa-

rameters: a ﬁxed learning rate of 0.01 for the whole

training session, a total number of 20 training epochs

and a mini-batch size of 4096.

The training time is very different comparing the

two approaches: while LSTMs are trained in about 8

hours, it takes only about 1 hour to train a CNN.

5.1 Regression of the Drum Speed

Results regarding the drum speed estimation are re-

ported in terms of RMSE in table 4. In accordance

Table 4: Results obtained for drum speed regression using

a different number of hidden units.

RMSE

2 50.78 rpm

10 41.38 rpm

40 33.87 rpm

50 38.60 rpm

with the outcome of the Bayesian optimization, the

best result is obtained using 40 hidden units. A lower

state dimension results in a network that is not com-

plex enough to model the time sequence, hence the

RMSE increases. On the contrary, using a model

which is too complex for this task leads to overﬁt-

ting, hence to worse results. In Figure 2 an example

of time domain result of the normalized drum speed

estimation is shown.

0 0.2 0.4

0.6

0.8 1 1.2 1.4

1.6

·10

Sample [-]

Normalized Drum Speed [-]

Predicted

Test

8,000

8,500

9,000

Figure 2: Time domain result of normalized drum speed

estimation. For clarity’s sake, part of the plot has been

zoomed on the left.

5.2 Classiﬁcation of Loads

Results for each load are reported in Table 5.

The performances obtained for the heater status

classiﬁcation are good using both deep learning tools.

This is due to the fact that the heater, when active,

draws from the grid a large amount of energy. Hence,

the real part of the ﬁrst current harmonic increases

very much when this load is active, thus making the

status detection easier for the net. Similar results are

Loads Estimation using Deep Learning Techniques in Consumer Washing Machines

431

Table 5: Results obtained for load classiﬁcation using LSTM and CNN.

Network Heater Drain Pump Electrovalves

LSTM

F1 class ON 99.82% 97.44% 37.61%

F1 class OFF 99.68% 99.54% 87.81%

CNN

F1 class ON 99.72% 98.57% 98.39%

F1 class OFF 99.95% 99.75% 99.87%

obtained for the drain pump. This load is quite easy

to detect and hence both the deep learning approaches

provide good results. The same cannot be said for

electrovalves. Regarding this load, it is clear that

CNNs outperform LSTMs.

6 CONCLUSIONS

In this work two different problems have been faced:

the drum speed estimation of a washing machine and

the activation status classiﬁcation of different loads of

the same appliance. The ﬁrst has been solved training

an LSTM network that estimates the speed at each

time instant. Results on the test set prove that good

performances can be achieved using this network, es-

pecially if the state dimension of the network is set

solving an optimization problem.

As for the second problem, two different ap-

proaches have been tested. The ﬁrst consisted in

training an LSTM network (with an optimal number

of hidden units) whereas the second makes use of

CNNs. Good results have been achieved using both

the networks for two out of three loads (heater and

drain pump). Conversely, it is clear that using only

a weighted classiﬁcation layer in the electrovalves-

status classiﬁcation, is not enough to cope with class

unbalance, thus using CNNs leads to much better re-

sults. Hence, even though LSTMs are easier to train

and test (since the only preprocessing operation re-

quired is the normalization), CNNs will be preferred

since perform better in classifying all the loads.

REFERENCES

Alasalmi, T., Suutala, J., and R

oning, J. (2012). Real-

time non-intrusive appliance load monitor. In Inter-

national conference on Smart grids and Green IT Sys-

tems, pages 203–208.

Batista, G. E., Prati, R. C., and Monard, M. C. (2004). A

study of the behavior of several methods for balancing

machine learning training data. ACM SIGKDD explo-

rations newsletter, 6(1):20–29.

Buda, M., Maki, A., and Mazurowski, M. A. (2018). A

systematic study of the class imbalance problem in

convolutional neural networks. Neural Networks,

106:249–259.

Djordjevic, S. and Simic, M. (2018). Nonintrusive identiﬁ-

cation of residential appliances using harmonic anal-

ysis. Turkish Journal of Electrical Engineering and

Computer Sciences, 26.

Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep sparse

rectiﬁer neural networks. In Proceedings of the four-

teenth international conference on artiﬁcial intelli-

gence and statistics, pages 315–323.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep

learning. MIT press.

Grangier, D., Bottou, L., and Collobert, R. (2009). Deep

convolutional networks for scene parsing. In ICML

2009 Deep Learning Workshop, volume 3, page 109.

Citeseer.

Hannun, A. Y., Rajpurkar, P., Haghpanahi, M., Tison, G. H.,

Bourn, C., Turakhia, M. P., and Ng, A. Y. (2019).

Cardiologist-level arrhythmia detection and classiﬁ-

cation in ambulatory electrocardiograms using a deep

neural network. Nature medicine, 25(1):65.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

Huang, X., Yin, B., Zhang, R., and Wei, Z. (2019). Study

of steady-state feature extraction algorithm based on

emd. In IOP Conference Series: Materials Science

and Engineering, volume 490, page 062036. IOP Pub-

lishing.

Jeni, L. A., Cohn, J. F., and De La Torre, F. (2013). Fac-

ing imbalanced data–recommendations for the use of

performance metrics. In 2013 Humaine association

conference on affective computing and intelligent in-

teraction, pages 245–251. IEEE.

Kim, J., Le, T.-T.-H., and Kim, H. (2017). Nonintrusive

load monitoring based on advanced deep learning and

novel signature. Computational intelligence and neu-

roscience, 2017.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. In Advances in neural information process-

ing systems, pages 1097–1105.

Maitre, J., Glon, G., Gaboury, S., Bouchard, B., and

Bouzouane, A. (2015). Efﬁcient appliances recog-

nition in smart homes based on active and reactive

power, fast fourier transform and decision trees. In

Workshops at the Twenty-Ninth AAAI Conference on

Artiﬁcial Intelligence.

Mocanu, E., Nguyen, P. H., Gibescu, M., and Kling, W. L.

(2016). Deep learning for estimating building energy

consumption. Sustainable Energy, Grids and Net-

works, 6:91–99.

Susto, G. A., Zambonin, G., Altinier, F., Pesavento, E., and

Beghi, A. (2018). A soft sensing approach for clothes

load estimation in consumer washing machines. In

2018 IEEE Conference on Control Technology and

Applications (CCTA), pages 1252–1257. IEEE.

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

432