Water Consumption Demand Pattern Analysis using Uncertain Smart

Water Meter Data

Milad Khaki

12 a

and Nasim Mortazavi

2 b

Electrical and Computer Engineering Department, University of Waterloo, Waterloo, Canada

Robarts Research Institute, Western University, London, Canada

Keywords:

Smart Water Meters, Data Mining, Big Data, Big Data Errors, Case-based Reasoning.

Abstract:

Wireless ‘smart’ water meters that allow functionalities such as demand response, leak alerts, identiﬁcation

of characteristic demand patterns, and detailed consumption analysis are becoming an essential part of water

infrastructure in many countries. To achieve these beneﬁts, the meter data needs to be error-free, which is

not necessarily available in practice due to ‘dirtiness’ or ‘uncertainty’ of data, which is mostly unavoidable.

Additionally, by analyzing the smart meter data and ﬁnding demand patterns, it is possible to provide insights

to the municipalities to improve their distribution network, better understand demand characteristics, identify

the consumers that are the main sources of shaping the high consumption peaks. This paper investigates

solutions to mine the uncertain data, ensures the validity of results, and evaluates the impact of dirty data on

data analysis results. Once the reliability of results is ensured, the evaluation results can be used for informed

decision-making on water planning strategies. Secondly, the consumption pattern of a city equipped with 25

thousand water consumers is analyzed, and weekly consumption proﬁles over an entire year are presented

for single-family residential consumers. Additionally, a systematic study of the errors existing in large-scale

smart water meter deployments is performed to better understand the nature of errors in such data sources,

particularly at the ﬁrst stages of implementation of smart metering infrastructure. Also, the sensitivity of the

results to various types of errors in a big data system is presented and investigated.

1 INTRODUCTION

As a cost-saving measure, many municipalities have

decided to install wireless ‘smart’ water meters that,

in addition to all other beneﬁts, primarily enable them

to read meters remotely. Toronto and Saskatoon, in

Canada, and Baltimore and Pittsburgh, in the United

States A substantial fraction of data obtained from

virtually all large-scale meter deployments can be

incorrect (such as examples in (Quilumba, F.L. and

Wei-Jen Lee and Heng Huang and Wang, D.Y. and

Szabados, R., 2014), (Liu et al., 2018), (Shishido,

Juan, 2012), (Kaisler, Stephen and Armour, Frank

and Espinosa, J Alberto and Money, William, 2013),

(Sivarajah et al., 2017), (Chen et al., 2013), (Lon

House, 2011), and (Courtney, 2014)).

The focus of this paper is to highlight the detri-

mental effects of data errors in reducing the beneﬁts

of using the concept of big data. The impact of uncer-

tain data on the identiﬁcation of customers contribut-

https://orcid.org/0000-0003-0566-727X

https://orcid.org/0000-0002-3257-2463

ing to a peak load is examined to evaluate the data

quality. The proposed progressive approach helps to

determine errors, their origins and ﬁnd solutions to

remove them. Essentially, data cleaning or data qual-

ity evaluation must precede any data analysis from

smart meter data. The contributions of this work can

be summarized as (1) a systematic study of the errors

existing in large-scale smart water meter deployments

and water literature, (2) proposing a progressive data

cleaning approach to the problem of ﬁnding errors in

smart meter data (3) a careful study of the impact of

dirty data on peak load attribution, and (4) introduc-

ing and classiﬁcation of techniques available for re-

moving errors from dirty data; including those meth-

ods applied in this study. The remainder of the paper

is structured as follows. Section 2 provides a gen-

eralized model of smart water meter infrastructure.

The progressive data cleaning approach is presented

in Section 3, the data quality issues mainly encoun-

tered in the current study, together with the adopted

or produced solutions. As the ﬁnal part of the case

study, the results of using the cleaned dataset are pre-

436

Khaki, M. and Mortazavi, N.

Water Consumption Demand Pattern Analysis using Uncertain Smart Water Meter Data.

DOI: 10.5220/0010834900003116

In Proceedings of the 14th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2022) - Volume 3, pages 436-443

ISBN: 978-989-758-547-0; ISSN: 2184-433X

sented in Section 4, and the sensitivity of these results

to errors is also examined.

2 PROBLEM DEFINITION

The ﬁrst part of the current section describes a top-

down architecture of a smart water metering infras-

tructure. In the second part, various errors that can be

encountered in such a system (based on experts’ ex-

perience and reports in the literature) are discussed.

Finally, the approaches adopted previously are pro-

vided in several distinct categories, and similar stud-

ies in smart electrical energy generation and transmis-

sion systems are also compared and analyzed.

2.1 Smart Metering Infrastructure

Figure 1 shows a general conﬁguration of a smart

meter infrastructure in the case of water supply net-

works. The proposed ﬁgure is based on the current

case study. In addition, to keep it generalized, it is in-

ﬂuenced by the diagrams suggested by the following

articles: (Stewart, Rodney A and Willis, Rachelle and

Giurco, Damien and Panuwatwanich, Kriengsak and

Capati, Guillermo, 2010), (Makki, A.A. and Stew-

art, R.A. and Panuwatwanich, K. and Beal, C., 2013),

(Quilumba, F.L. and Wei-Jen Lee and Heng Huang

and Wang, D.Y. and Szabados, R., 2014), (Hsia, S.C.

and Hsu, S.W. and Chang, Y.J., 2012), (Leeds, 2009),

(Zhang et al., 2017), and (Farhangi, H., 2010)). The

block diagram in Figure 1 is composed of the follow-

ing parts:

Block (A), wireless smart meters distributed

around the city measure water consumption in a stan-

dard uniﬁed unit, e.g. [m

]. Block (B), wireless

data collectors are hardware-speciﬁc data collection

servers responsible for collecting the readings from

meters at every interval and transferring them wire-

lessly/wired to the data warehouse Block (C) is the

control centre of the utility infrastructure. Commands

to re-conﬁgure the meters or collectors are relayed

through this block. Block (D) is the Temporary Mea-

surement Data Storage; it receives the raw measure-

ment data from the collectors and provides outputs for

blocks E, and F. Block (E) is the long-term storage or

archive of the network and stores the data for future

analyses. Any further access or modiﬁcation to the

archived data is provided through Block C. Block (F)

is the billing system and can join the raw meter read-

ings received from the meters with the meter-speciﬁc

unit information.

2.2 Data Analysis Difﬁculties

As we are currently in a worldwide installation phase

of the SMI, the focus of most studies is the immediate

advantages, such as time-of-user pricing (Lon House,

2011), efﬁcient automatic billing instead of the man-

ual process (Khalifa, T. and Naik, K. and Nayak, A.,

2011), and early fault detection in the network (Hsia,

S.C. and Hsu, S.W. and Chang, Y.J., 2012). Although

various SMIs provide various beneﬁts, validity veri-

ﬁcation of the measurement data is essential. How-

ever, several reports of smart water meter measure-

ment errors among the growing body of studies, such

as (Mukheibir et al., 2012). The factors that inﬂuence

the data quality of water meter readings are discussed

by (Mukheibir et al., 2012) and (Arregui, Francisco

and Cabrera, E and Cobacho, Ricardo and Garc

ıa-

Serra, Jorge, 2005): 1) noisy communication chan-

nels that would lead to corruption of the incoming

data messages 2) minor inconsistencies in the meter

data input result in signiﬁcant uncertainty in the re-

sults too (Aijun et al., 1996).

Data quality challenges are introduced in the re-

mainder of this section, and some possible starting

points will be suggested concerning Figure 1.

Duplicate Records, because of the communica-

tion channel problems, Paths (P) or (N), the server

might ask the collector or the meter to retransmit

the data. Missing Records, similarly, because of

the communication channel issues, some recordings

would be irreversibly lost. Any communication chan-

nel problems between Blocks A and B or an inter-

rupt in the storage services of Blocks D, E, or F can

cause this issue. Measurement Granularity Errors,

in some cases, a meter can have coarse grain res-

olution and cause this error, which is restricted to

Block A (i.e. [m

] instead of litres). As a result,

the accuracy of the meter would be virtually reduced.

Block C should ensure that the temporary data stored

at Block D do not have such problems. Spikes are

abrupt and short-duration changes in the consump-

tion pattern that are not a valid representation of the

actual consumption. The sources of spikes could be

mechanical faults of the meter or storage of multiple

inconsistent readings for the same timestamp Meter

Unit Inconsistencies. This error can be originated

by meter unit changes that are not back-propagated

in the archived records. In such cases, Block C’s de-

cisions are affecting Block A’s conﬁguration. How-

ever, this error type would not necessarily change

Block E’s billing records, as, at the time of calculat-

ing corresponding billing values, there is no discrep-

ancy between meter readings and its respective unit.

Meter Counter Resets, the smart meters usually ac-

Water Consumption Demand Pattern Analysis using Uncertain Smart Water Meter Data

437

SWM

(Cumulative Int erval Readings)

Wireless Data

Collectors

CCUs

(Approximately one

per 1000

Smart Meter)

Current Billable

Readings

(D to F)

GPRS

Ethernet

Billings

For

Consumers

(F to G)

Consumer

Concerns

And complaints

(G to f)

SWM

(Cumulative Int erval Readings)

SWM

(Cumulative Int erval Readings)

Infrastructure

Administration

and Planning

Firewall

Primary Meter and Collector

configuration requests (F to C)

And Archived Billing Records

(C to F)

Transferring

Meter and Collector

Change Requests

(C to B and A )

Billing

Software and

GUI

Long Term

Storage for

Measurements

Temporary

Measurement

Data Storage

Customers

Smart Water Meter Smart Water Meter

Smart Water Meter

Transformed

(or Raw)

Reading and

Interval Data

Exports

for Archive

(D to E)

Gathered

Data

From all

Meters

(A and B to D)

Consumption

Monitoring

Updates to

Archived Data

(C to E)

Receive Archive

Data for Analysis

and Data Mining

(E to C)

Figure 1: Block diagram of the wireless water metering infrastructure.

commodate a counter that registers the consumption

at every interval cumulatively. In general, the me-

ter only communicates these cumulative readings to

the server. Therefore, if the server re-conﬁgures the

meter, it can also cause a reset on its register with

a faulty command. In Figure 1, this inconsistency

is caused by Block C and affects Block A. Meter

Under/Non-Registration Errors; a popular belief is

that a smart meter has high precision and would not be

prone to measurement errors. As smart meters are the

next generation of traditional ones, the accuracy prob-

lems existing in the traditional meters also occur in

them (Khalifa, T. and Naik, K. and Nayak, A., 2011).

Analysis of the current literature in SMIs for Water

systems shows that most studies do not evaluate data

quality against the mentioned errors. However, data

quality errors have impeded gaining the expected re-

sults in most of these studies. In addition, few pa-

pers in electrical engineering-based smart meter in-

frastructures have focused on these errors either. Only

Quilumba et al. and Shishido, a technical report, have

acknowledged the existence of some of the mentioned

errors in their study and provided some solutions for

handling them ((Quilumba, F.L. and Wei-Jen Lee and

Heng Huang and Wang, D.Y. and Szabados, R., 2014)

and (Shishido, Juan, 2012)).

2.3 Related Works

The water meters are prone to data quality errors,

such as over- and under-registration, which are di-

rectly proportional to length and amount of us-

age (Mukheibir et al., 2012). As one of the contri-

butions of the current paper, a summary of the state-

of-art methods for evaluating and improving the data

quality of water meter data in the literature is pro-

vided. In general, three approaches to dealing with

data quality issues are presented, outlined in the re-

mainder of this section. The ﬁrst approach to dealing

with errors is simplifying the problem and discard-

ing the detrimental effect of errors because of the low

proportion of errors to clean data. For example, (Beal

et al., 2011) and (Beal, Cara and Stewart, Rodney A.

and Huang, T. and Rey, E., 2011) provide consider-

able detail about the procedures for installing smart

meters and gathering data. However, as the data qual-

ity is not discussed, it is assumed that the collected

data is error-free.

The second approach is to discard the datastreams

that are highly suspected of having errors. For exam-

ple, (Heinrich, Matthias, 2007) performed a study us-

ing twelve household datastreams, of which two had

some missing data points because of various meter

failure issues and were removed from further analysis.

Similarly, Fielding et al. recognized the adverse effect

of excessive missing data on the results and removed

17% of the streams, which had insufﬁcient valid data.

Makki et al. encountered the problem of missing data

while using smart water data and removed the affected

household measurements (Makki, A.A. and Stewart,

R.A. and Panuwatwanich, K. and Beal, C., 2013). De-

spite the reported problems, neither the nature of er-

rors is discussed nor any solutions to remove them is

provided in all cases above. Fielding et al. have only

suggested using more accurate hardware to improve

future data (Fielding et al., 2013). The advantage of

using the above approach is its simplicity, and it can

merely be used for instances where a negligible per-

centage of data is affected by errors. In these cases,

the omission of erroneous data would not cause the

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

438

loss of valuable information.

The third approach is to approximate the missing

or corrupted data based on the readings in the tempo-

ral proximity of that speciﬁc point. The replacement

candidate is calculated using either a predeﬁned de-

fault value or an average over the previously valid data

points or replacing the value from a similar location

of another datastream (Machell, J. and Mounce, SR.

and Boxall, JB., 2010; Umapathi et al., 2013). An-

other approach that has gained popularity during the

past decade is crowdsourcing of the cleaning process.

Traditionally, the cleaning process was performed by

domain and database experts. If the errors are sim-

ple errors such as typos or optical character recog-

nizer (OCR) issues, an untrained operator is capable

of checking the records for error (Chen et al., 2013).

However, if the data requires expert knowledge or it

would not be possible to share it with a third party,

this method is not possible.

The data quality issues in smart grids exist in the

electricity supply systems. They have gained more

in-depth analysis because of the effect that electri-

cal energy cannot be easily stored. Therefore, the

electricity industry has always been more forthcom-

ing in investment for research and implementation of

smart meters (Alquthami et al., 2019). The major-

ity of the efforts in Smart Electrical Energy Gener-

ation and Transmission Systems are done by the in-

dustries involved in this ﬁeld. For example, Albert et

al., Shishido, and Quilumba et al. mention concerns

about errors occurring in the measurement data that

affect data quality that is quite similar to the current

study (such as missing data, reading errors, lack of

demographic survey data, zero readings, spikes and

duplicate readings) and provide preliminary analysis

for them ((Shishido, Juan, 2012) and (Quilumba,

F.L. and Wei-Jen Lee and Heng Huang and Wang,

D.Y. and Szabados, R., 2014)). In both Shishido and

Quilumba et al., these errors can propagate results and

deteriorate them. Moreover, Quilumba et al. present

more details of the errors’ nature and discuss an appli-

cation of consumer proﬁle classiﬁcation by k-means

clustering with the semi-cleaned data as training and

test inputs.

3 PROGRESSIVE DATA

CLEANING

Essentially, the goals of a smart infrastructure are to

analyze various states of the system, make it more

optimized in many aspects, and have a bi-directional

communication channel with the meter. As com-

promised data quality would directly affect analy-

sis results, the main concern is ﬁnding out how data

quality issues could impact them and how to avoid

them. Jia et al. have studied the results of bad data

on smart electrical energy generation and transmis-

sion systems and demonstrated how it would affect

decision-making results. They hypothesize that the

error in data comes in the nature of noise or misread-

ing of the actual measurement values. In addition,

a metric is deﬁned to quantify the effect of bad data

on real-time price, which is called Average Relative

Price Perturbation. The authors have concluded that

errors in topographical data are more detrimental for

the pricing schemes than the measurement data (Jia,

L. and Kim, J. and Thomas, R.J. and Tong, L., 2014).

3.1 Filter-based Progressive Data

Inspection

Depending on the nature of data being processed and

previous experiences dealing with such systems, the

types and extent of errors in the dataset could be dif-

ferent. The current approach is an experimental error

detection technique that ensures that most of the de-

tectable errors by the applied ﬁlters are found. The

procedure consists of applying the ﬁlter to the most

updated date state and evaluating the results to ensure

its quality. If the data quality does not meet the re-

quirements, additional iterations might be required to

achieve the minimum required accuracy.

3.2 Pre-mining Issues

In general, smart meter data is acquired in two ways:

modifying existing infrastructure with equipment to

gather data or collaborating with an already imple-

mented metering infrastructure to use their data. The

former has the advantage of monitoring data acqui-

sition thoroughly, and data integrity can be validated

on each step. However, surveillance coverage is lim-

ited to the budget and customers’ willingness to par-

ticipate in the study. In contrast, the latter approach

mostly provides access to the entire infrastructure,

while the authorities in charge allow this access and

a great opportunity for the large-scale study of the as-

pects of the big data in the smart grid. Two main is-

sues encountered while dealing with large-scale smart

water meter data will be introduced and analyzed in

detail in the next two parts.

3.2.1 Primary Composite Key

As a part of the importing smart meter data, each me-

ter is required to be identiﬁed uniquely across all ta-

bles; therefore, as the original primary key was not

Water Consumption Demand Pattern Analysis using Uncertain Smart Water Meter Data

439

provided, a JOIN operation was required. Ideally, the

join should be performed on a single primary key or a

composite one constructed by combining more ﬁelds.

In theory, the main key used by the server, Blocks D,

E, and F in Figure 1, would unify all datasets. How-

ever, personal information can be disclosed, which is

a breach of customer information conﬁdentiality, and

this primary key was not provided; one possible solu-

tion is to redeﬁne the primary composite key.

Three individual ﬁelds shared among imported

datasets and were the most probable candidates for re-

constructing the primary key are Account ID, Meter

ID, and Recording Device ID. The join process was

changed to accept the strings with partial matches as

well as the complete ones.

3.3 Filter: Peak Deﬁnition and Peak

Contributors

“Peak Consumption” is a valuable character of WSS

that provides means to examine the network’s capa-

bility to handle the volume of water at any period of

peak consumption. Additionally, the system should

be designed for long-term peak consumption of the

entire network for water planning purposes. To ﬁnd

the actual peak contributors, the highest consumption

over a period should be identiﬁed after ﬁnding the

temporal location of the peak period, a top-k query

analysis to identify the main contributors. After peak

consumers are narrowed down, their raw consumption

proﬁles are inspected to verify the validity of peaking

behaviour. Essentially, the errors in these records can

cause inaccurate calculation and, consequently, incor-

rect decision-making, which will be discussed in the

remainder of the paper.

After importing data in a correct format, it is re-

quired to adopt a ﬁlter with predictable outputs to

evaluate data quality. The peak contribution analysis

is important as it enables us to ﬁnd the proﬁles that are

the worst candidates for being affected by errors and

are the focus of the current study. The peak contri-

bution ﬁlter is a starting point for more complex data

quality analyzes. Because of the inherent characteris-

tics of water supply systems, instantaneous peak con-

sumption does not have a signiﬁcant practical value.

Thus, in the context of such large-scale systems, the

peak value is described as the maximum average con-

sumption of a consumer (or group of consumers), dur-

ing a speciﬁc time range R (in hours or days), for a

predeﬁned constant window size (W in hours or days).

The peak averaging window (W ) can take values of a

few hours to a few weeks, depending on the natural

lag and physical size of the water transmission net-

work in question.

3.4 Evaluation Tool: Ranked List

Deﬁnition and Comparison

A ranking metric to evaluate their correlation requires

two lists of peak demand contributors calculated from

both clean and dirty data. The evaluation would quan-

tify the effect of each meter error on data quality

by comparing the corresponding ranked lists. The

ranking algorithm proposed by Kendal et al. is ex-

tensively used to compare an erroneous permuted or

partially permuted list with a given (correct) refer-

ence (Van Doorn et al., 2018). A variant of the al-

gorithm that permits weights for each rank is used in

the current paper that is proposed by (D’Alberto and

Dasdan, 2010).

4 EXPERIMENTAL RESULTS

AND SENSITIVITY ANALYSIS

This section analyzes the city’s smart meter data to

determine peak contributors and how their order and

ranking would respond to different errors.

4.1 Peak Contribution Results

It was reported that the highest peak consumption

record occurred on July 24, 2013. To ﬁnd those con-

sumers who most contributed to this peak date, a peak

length is required to accommodate the natural lag in

water supply networks. Therefore, two peak win-

dow periods are selected for the current study: 24-

hours and one week (168 hours), as representatives

of short and medium-term consumption peaks. Ad-

ditionally, results are generated using clean and dirty

datasets to emphasize the effect of noise and data er-

rors. Dirty data contains errors described previously;

while, clean data is generated by removing the errors,

performed semi-automatically under expert supervi-

sion.

Table 1 compares the results of calculating peak

windows of length 24 and 168 hours and shows that

the peak event (in 24-hours) occurred on July 16,

2013, at 3:00 pm. However, at midnight, the respec-

tive peak event for the dirty original dataset started on

Feb 19, 2013. The detected time does not match the

correct peak, which exactly overlaps with the value

reported by the city, and no justiﬁable reason exists

for a peak occurring in winter. Similarly, consider-

able inconsistency is observable in the weekly peak

caused by enlargement and deformation of records by

associating high consumption to a small set of cus-

tomers.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

440

Table 1: Comparison of the top six peak contributors of

data for the peak window lengths of 168 hours. Categories

(CAT) are abbreviated as: Agricultural (AGR), Commercial

(COM), Industrial (IND), Institutional (INS), Multi-Family

Residences (MFR), and Single-Family Residence (SFR).

Clean Data (7 Day Peak) Dirty Data (7 Day Peak)

Start: Jul 19,2013,7:00pm Start: Feb 18,2013,4:00pm

End: Jul 26,2013,7:00pm End: Feb 25,2013,4:00pm

Rank

Clean Cons. Dirty Cons. Rank in Real

Data in Data in Clean Cons.

Cat. [m

] Cat. [m

] Data [m

]

1st IND 8,367.0 SFR 511,531.0 2832 5

2nd IND 7,539.0 COM 20,000.0 1170 20

3rd IND 4,569.1 MFR 17,000.0 1105 22

4th COM 4,480.0 IND 11,738.0 1 8367

5th AGR 4,373.7 MFR 9,748.0 497 96.24

6th IND 4,030.0 MFR 8,500.0 441 110

The table also provides the top ten consumers and

categories and the correct ranking of dirty data candi-

dates. Only two consumers in the clean top ten are de-

tected correctly in dirty data but with the wrong order,

and the remaining are not valid. Another unexpected

observation is that the ﬁrst peak contributor in dirty

data for 24-hour window size, Table 1, has real con-

sumption of zero. It can be explained by the fact that

the peak period of dirty data is in a different season,

which explains that the consumer has high consump-

tion in one season and none in another one.

In comparison with current results, the reported

highest consumption day (Jul 24, 2013) falls exactly

into the range of the results of the seven-day peak con-

tribution, which conﬁrms the cleaned dataset results.

4.2 The Single-family Residence

Consumption Proﬁle

An important contribution of this work, which was

initially asked by the city providing the data, was to

predict the consumption patterns of different types

of customers. An accurately calculated consump-

tion proﬁle can provide valuable information on how

the demand is distributed and correctly predict future

consumption values. A major hurdle in calculating

the demand pattern of water consumption is the het-

erogeneity of the consumers in a water distribution

system (Avni et al., 2015). The existence of the con-

sumption proﬁle of a city that the validity and in-

tegrity of the data are established can shed light on

different aspects of this problem. Of the 25,000 ac-

tive consumers of the city, 85% or roughly 21,000

of them were single-family residences (SFRES), and

the hourly consumption data of such volume of con-

sumers can provide a highly reliable weekly con-

sumption pattern. The results of calculating the av-

erage consumption proﬁle of the SFRES are shown

in Figures 3, and 2. Figure 2 shows the daily con-

sumption average changes during a year. The sea-

sonal effect is visible in the average proﬁle. Like

the previous analysis of peak contributors, this anal-

ysis led to the ﬁnding and removal of different errors

in the datastreams (from Meter Unit inconsistencies

to spikes and missing data). Additionally, Figure 3

shows the hourly consumption proﬁle average over

all single-family households. To better visualize the

changes in consumption during different weeks of the

year, the proﬁles are shown in three percentiles of 10,

50 and 90.

Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug

Month

200

400

600

800

1000

1200

1400

Average Weekly Consumption per Customer [Litres/Day]

10th Percentile

50th Percentile

90th Percentile

Fall

Winter

Spring

Summer

Figure 2: Annual changes of average daily consumption

percentiles of the city (per customer), using hourly con-

sumption proﬁles of 21,000 streams of smart meter read-

ings.

Sat Sun Mon Tue Wed Thu Fri

Day of Week

Consumption [Litres/Hr]

Figure 3: Average weekly consumption percentiles of the

city (per customer), using hourly consumption proﬁles of

21,000 streams of smart meter readings (Green 10, Blue 50,

Red 90 Percentiles.

5 CONCLUSIONS AND FUTURE

WORK

To perform valuable data analysis tasks on smart me-

ter data, measurement data needs to be error-free as

Water Consumption Demand Pattern Analysis using Uncertain Smart Water Meter Data

441

an essential part of the process. Studies have found

that in a majority of the cases, data is not in the de-

sired condition, and measurements mixed with vari-

ous kinds of errors are generated by the meters.

This paper was focused on the progressive clean-

ing of data while analyzing the impact of data errors

on the performance of a speciﬁc ﬁlter, namely, peak

consumer identiﬁcation and SFRES consumption pro-

ﬁles. During the progressive cleaning process, vari-

ous sources of errors, such as mistakes made by op-

erators, hardware failures, and context-dependent er-

rors, were identiﬁed. In addition, systematic ways

of removing the main contributing errors (meter unit

inconsistencies, the meter resets, spikes, duplicated

records, and duplicated datastreams) were provided

and more complex errors were characterized, as well.

The results of cleaning data and application of

the ﬁlter (performing peak detection tasks) were pre-

sented, and the cleaning process’s signiﬁcance was

demonstrated. Also, the sensitivity of the outputs to

the errors in the data and the parameters of the peak

detection ﬁlter was examined.

To conclude, data cleaning is an essential part of

big data application in smart meter measurement anal-

ysis. However, prior knowledge of the state of data

quality and the sensitivity of the results to different

types of error is required. Smart meter data analysis

is still in its early stages and can beneﬁt considerably

from further research. Some possible extensions of

the work were presented in this paper. The data qual-

ity should be evaluated using other physical charac-

teristics of the water supply infrastructure, assuming

feasibility of acquiring them, such as pressure infor-

mation of various key nodes, mass balancing of the

consumption and production, using bulk meter data of

the network. Many possible errors in the datastreams

have been detected in this work; however, other ﬁlters

can detect other potential errors. Examples of such

ﬁlters can be: “does the hourly consumption proﬁle

of different customer categories follow the expected

minimum and maximum load?”

The other extension is to examine the effect of

quantized meters on data quality and devise clean-

ing methods that can deal with such error types more

effectively. In addition, missing data points, an in-

evitable aspect of every smart system, were analyzed,

compensating their effects. As a future project, sim-

ilar to the procedure performed for errors in this pa-

per, missing data can be characterized more system-

atically.

REFERENCES

Aijun, A., Ning, S., Chan, C., Cercone, N., and Ziarko, W.

(1996). Discovering rules for water demand predic-

tion: An enhanced rough-set approach. Engineering

Applications of Artiﬁcial Intelligence, 9:645–653.

Alquthami, T., Alsubaie, A., and Anwer, M. (2019). Impor-

tance of smart meters data processing – case of saudi

arabia. In 2019 International Conference on Elec-

trical and Computing Technologies and Applications

(ICECTA), pages 1–5.

Arregui, Francisco and Cabrera, E and Cobacho, Ricardo

and Garc

ıa-Serra, Jorge (2005). Key factors affecting

water meter accuracy. In Leakage 2005, pages 1–10,

Portugal. Leakage 2005.

Avni, N., Fishbain, B., and Shamir, U. (2015). Water con-

sumption patterns as a basis for water demand model-

ing. Water Resources Research, 51(10):8165–8181.

Beal, C., Stewart, R. A., Huang, T., and Rey, E. (2011).

SEQ residential end use study. Australian Water As-

sociation, 38(1):80–84.

Beal, Cara and Stewart, Rodney A. and Huang, T. and Rey,

E. (2011). South East Queensland residential end use

study: Final Report. Journal of the Australian Water

Association, 38(1):80–84.

Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., and

Zhou, X. (2013). Big data challenge: a data man-

agement perspective. Frontiers of computer Science,

7(2):157–164.

Courtney, M. (2014). How utilities are proﬁting from Big

Data analytics. Engineering and Technology Maga-

zine.

D’Alberto, P. and Dasdan, A. (2010). On the Weakenesses

of Correlation Measures used for Search Engines’ Re-

sults. Cornell University Library. Access on: 2014-

12-15.

Farhangi, H. (2010). The path of the smart grid. IEEE

power and energy magazine, 8(1).

Fielding, K. S., Spinks, A., Russell, S., McCrea, R., Stew-

art, R. A., and Gardner, J. (2013). An experimental

test of voluntary strategies to promote urban water de-

mand management. Journal of Environmental Man-

agement, 114(0):343–351.

Heinrich, Matthias (2007). Water End Use and Efﬁciency

Project (WEEP): Final Report. Technical report,

BRANZ Ltd., Judgeford, New Zealand. BRANZ

Study Report 159.

Hsia, S.C. and Hsu, S.W. and Chang, Y.J. (2012). Remote

monitoring and smart sensing for water meter system

and leakage detection. IET Wireless sensor systems,

2(4):402–408.

Jia, L. and Kim, J. and Thomas, R.J. and Tong, L.

(2014). Impact of data quality on real-time locational

marginal price. IEEE Transactions on Power Systems,

29(2):627–636.

Kaisler, Stephen and Armour, Frank and Espinosa, J Al-

berto and Money, William (2013). Big data: Issues

and challenges moving forward. In System Sciences

(HICSS), 46th International Conference on, pages

995–1004, Hawaii. IEEE.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

442

Khalifa, T. and Naik, K. and Nayak, A. (2011). A survey of

communication protocols for automatic meter reading

applications. IEEE Communications Surveys & Tuto-

rials, 13(2):168–182.

Leeds, D. (2009). The smart grid in 2010: market seg-

ments, applications and industry players. Gtm Re-

search, pages 1–145.

Liu, F., He, Q., Hu, S., Wang, L., and Jia, Z. (2018). Es-

timation of smart meters errors using meter reading

data. In 2018 Conference on Precision Electromag-

netic Measurements (CPEM 2018), pages 1–2.

Lon House (2011). Time of Use Water Meter Impacts on

Customer Water Use. Technical report, California En-

ergy Commission.

Machell, J. and Mounce, SR. and Boxall, JB. (2010). On-

line modelling of water distribution systems: a uk

case study. Drinking Water Engineering and Science,

3:21–27.

Makki, A.A. and Stewart, R.A. and Panuwatwanich, K.

and Beal, C. (2013). Revealing the determinants of

shower water end use consumption: enabling better

targeted urban water conservation strategies. Journal

of Cleaner Production, 60:129–146.

Mukheibir, P., Stewart, R. A., Giurco, D., and O’Halloran,

K. (2012). Understanding non-registration in domes-

tic water meters: Implications for meter replacement

strategies. Australian Water Association.

Quilumba, F.L. and Wei-Jen Lee and Heng Huang and

Wang, D.Y. and Szabados, R. (2014). An overview of

AMI data preprocessing to enhance the performance

of load forecasting. In Industry Applications Soci-

ety Annual Meeting, pages 1–7, Vancouver, Canada.

IEEE.

Shishido, Juan (2012). Smart meter data quality insights.

ACEEE Summer Study on Energy Efﬁciency in Build-

ings, 12:277–288.

Sivarajah, U., Kamal, M., Irani, Z., and Weerakkody, V.

(2017). Critical analysis of big data challenges and

analytical methods. Journal of Business Research,

70:263–286.

Stewart, Rodney A and Willis, Rachelle and Giurco,

Damien and Panuwatwanich, Kriengsak and Capati,

Guillermo (2010). Web-based knowledge manage-

ment system: linking smart metering to the future of

urban water planning. Australian Planner, 47(2):66–

74.

Umapathi, S., Chong, M. N., and Sharma, A. K. (2013).

Evaluation of plumbed rainwater tanks in households

for sustainable water resource management: a real-

time monitoring study. Journal of Cleaner Produc-

tion, 42(0):204–214.

Van Doorn, J., Ly, A., Marsman, M., and Wagenmakers,

E.-J. (2018). Bayesian inference for kendall’s rank

correlation coefﬁcient. The American Statistician,

72(4):303–308.

Zhang, G., Wang, G. G., Farhangi, H., and Palizban, A.

(2017). Data mining of smart meters for load cate-

gory based disaggregation of residential power con-

sumption. Sustainable Energy, Grids and Networks,

10:92–103.

Water Consumption Demand Pattern Analysis using Uncertain Smart Water Meter Data

443