A Probabilistic Approach for Detecting Real Concept Drift

Sirvan Parasteh

and Samira Sadaoui

Computer Science Department, University of Regina, Regina, Canada

Keywords:

Concept Drift Detection, Real Concept Drift, Data Stream, Synthetic Datasets, Probability Distributions.

Abstract:

Concept Drift (CD) is a signiﬁcant challenge in real-world data stream applications, as its presence requires

predictive models to adapt to data-distribution changes over time. Our paper introduces a new algorithm,

Probabilistic Real-Drift Detection (PRDD), designed to track and respond to CD based on its probabilistic

deﬁnitions. PRDD utilizes the classiﬁer’s prediction errors and conﬁdence levels to detect speciﬁcally the

Real CD. In an exhaustive empirical study involving 16 synthetic datasets with Abrupt and Gradual drifts,

PRDD is compared to well-known CD detection methods. PRDD is highly performing and shows a time

complexity of O(1) per datapoint, ensuring its computational efﬁciency in high-velocity environments.

1 INTRODUCTION

Nowadays, many real-world applications come with

high-velocity and high-volume data, spanning sectors

such as e-commerce, healthcare, and ﬁnance. The

impossibility of storing the entire data for processing

necessitates that Machine Learning (ML) algorithms

can only view samples once. Ml algorithms assume

that both training and unseen samples follow the same

distribution. However, the underlying data distribu-

tions may shift over time in today’s evolving data-

generating sources. For example, user purchasing be-

havior may change due to unpredictable events like

the COVID pandemic or new types of products being

introduced in the e-commerce market over time. This

discrepancy between training and testing data distri-

butions, called Concept Drift (CD), is a signiﬁcant

challenge for researchers (Gama et al., 2014; Webb

et al., 2016), because CD degrades prediction qual-

ity substantially. The ML models may have learned

patterns no longer relevant to the new incoming data.

Therefore, in these non-stationary environments, con-

tinuous monitoring of the ML models’ performance

is essential, along with frequent updates to accommo-

date the newly detected concept. Indeed, the pres-

ence of CD in the data stream will majorly impact the

predictive models and decision-making tasks. Con-

sequently, understanding and detecting those unpre-

dictable data changes is vital to develop robust adap-

tation mechanisms. The CD is a complex notion in-

https://orcid.org/0000-0001-7642-2654

https://orcid.org/0000-0002-9887-1570

volving several features (Lu et al., 2018), such as the

type of change (Real or Virtual) and the transition

speed (Abrupt, Gradual, or Incremental). This paper

focuses on the Real CD and considers both Abrupt

and Gradual shifts.

Several studies examined the performance of nu-

merous CD detection algorithms, such as those con-

ducted in (Gonc¸alves Jr et al., 2014) and (Barros and

Santos, 2018). These studies showed that no single

approach consistently excels in all scenarios. Select-

ing a CD detection method is tied to the application’s

requirements, including the datasets’ characteristics

and the ML models being used. Moreover, these stud-

ies outlined several limitations, such as sensitivity to

the tuning of the parameters, a considerable computa-

tional cost, and challenges with complex data. Conse-

quently, there is still a need for more efﬁcient methods

capable of handling diverse data types and dynami-

cally adapting to rapidly evolving concepts.

Our paper proposes an efﬁcient solution for de-

tecting Real CD in data streams, named the Proba-

bilistic Real-Drift Detection (PRDD) algorithm. The

latter capitalizes on two key aspects of a classiﬁer’s

performance: (1) the prediction errors and (2) the

conﬁdence level in these predictions. The PRDD al-

gorithm is grounded in the formal deﬁnition of Real

CD, tracking changes in the posterior probability dis-

tribution P(y|x) over time to detect instances of drift

promptly. PRDD retains a ﬁxed-size moving win-

dow of the most recent data, enabling continuous

data stream monitoring and updating critical statis-

tical metrics. These metrics include the real drift

Parasteh, S. and Sadaoui, S.

A Probabilistic Approach for Detecting Real Concept Drift.

DOI: 10.5220/0012378800003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 2, pages 301-311

ISBN: 978-989-758-680-4; ISSN: 2184-433X

301

threshold (T

real

) and the ratio of real drift instances,

empowering PRDD to track and adapt to evolving

data streams. Our empirical study highlights the high

performance of PRDD when compared to ﬁve well-

established CD detection methods across 16 diverse

synthetic stream datasets. Furthermore, PRDD stands

out for its computational efﬁciency. With a time com-

plexity of O(1) per data point, PRDD’s computational

cost remains invariant with the size of the data stream,

making it especially suited for real-time applications

that deal with vast volumes of data. Our study’s con-

tributions are three-fold:

• We developed the PRDD algorithm based on the

formal deﬁnition of Real CD. In Real CD scenar-

ios, classiﬁers often make incorrect predictions,

yet exhibit high conﬁdence in these predictions.

This behavior aligns with the Bayesian deﬁni-

tion of Real CD, where the decision boundary

becomes ineffective even though the input data

distribution, P(x), remains unchanged. PRDD

harnesses the classiﬁer’s prediction probabilities

as an indicator of its conﬁdence, ensuring a rapid

response to CD. The main parameters of the

PRDD algorithm were ﬁne-tuned through rig-

orous experimental testing. Importantly, PRDD

has a consistent execution time of O(1) per data

sample, emphasizing its suitability for real-time

stream processing scenarios.

• Performing a comprehensive empirical study to

compare PRDD’s performance against traditional

CD detection methods (in total ﬁve) across 16 di-

verse synthetic stream datasets, handling Grad-

ual and Abrupt drifts. We also assess the base

learner (devoid of any CD detection mechanism)

to demonstrate the necessity of detecting CD. The

evaluation is carried out as follows: (1) Perfor-

mance (Accuracy and F1-score) of the seven drift

detection algorithms on eight Abrupt datasets, (2)

Performance of the seven drift detection algo-

rithms on eight Gradual drift datasets, (3) Ag-

gregated performance analysis using a rank-based

statistical test, and (4) Analysis of the average ex-

ecution time of each CD detection algorithm.

• We conducted an extensive empirical study com-

paring PRDD with ﬁve conventional CD detec-

tion methods across 16 synthetic stream datasets,

which include both Gradual and Abrupt drifts.

Furthermore, we evaluated a base learner without

any CD detection to underscore the importance of

CD detection. Our evaluation comprised: (1) An-

alyzing the performance (Accuracy and F1-score)

of the seven algorithms on eight Abrupt datasets,

(2) Assessing these algorithms on eight Gradual

drift datasets, (3) Undertaking an aggregated per-

formance analysis through a rank-based statistics,

and (4) Evaluating the average execution time for

each CD detection method.

This paper is organized as follows. Section 1

provides a formal deﬁnition of the Real CD with

Gradual and Abrupt shifts. Section 2 describes well-

established CD detection algorithms. Section 3 intro-

duces our Probabilistic Real-Drift Detection (PRDD)

algorithm, elaborating on its design and underlying

principles. Section 4 conducts an extensive empiri-

cal study to validate the performance of PRDD, com-

paring it with existing CD detection methods across

various synthetic stream datasets.

2 REAL CONCEPT DRIFT

We utilize the probabilistic deﬁnitions of CD given

in (Gama et al., 2014; Hoens et al., 2012; Webb

et al., 2017). Based on these deﬁnitions, two different

CD types have been recognized in the literature: (1)

Real drift, where only the learner’s decision boundary

changes, and (2) Virtual drift, where only the input-

feature distribution changes. The Bayesian approach

is a popular choice for developing CD detection meth-

ods, as it captures the changes in the joint distribution

of the features and class labels (Hoens et al., 2012).

In this paper, we focus only on the Real CD, indicat-

ing that the statistical properties of the target variable

change over time. More precisely, Real drift means

that the conditional distribution of the target variable

P(y | x) changed, while there is no change in the dis-

tribution of the input features P(x) (Lu et al., 2018):

(y | x) ̸= P

(y | x) and P

(x) = P

(x) (1)

where x represents a set of feature vectors and y its

corresponding target variable, and time u, which is af-

ter t, denotes when the data distribution has changed.

The posterior probability distribution is computed

using Bayes’ theorem as follows:

P(y|x) =

P(x|y)P(y)

P(x)

(2)

where P(x|y) is the likelihood of the features given

the target class, P(y) is the prior probability of the

class label and P(x) is the marginal probability of

the features. In the presence of CD, the prior dis-

tribution P(y) and likelihood distribution P(x|y) have

changed, leading to a change to the posterior distribu-

tion (Gama and Castillo, 2006).

Considering the Real CD deﬁnition, a learned

concept can remain stable for a period of time and

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

302

Table 1: Error-based Concept-Drift Detection Methods.

Method Implementation Year #citation

PH (Page, 1954) MultiFlow, River 1954 6563

DDM (Gama et al., 2004) MultiFlow, River 2004 1596

EDDM (Baena-Garcıa et al., 2006) MultiFlow, River 2006 868

STEPD(Nishida and Yamauchi, 2007) Git 2007 286

ADWIN (Bifet and Gavalda, 2007) MultiFlow, River 2007 1549

ECDD (Ross et al., 2012a) Git 2012 341

EWMA(Ross et al., 2012b) Git 2012 382

SeqDrift(Pears et al., 2014) Git 2013 46

HDDM (Frias-Blanco et al., 2014) MultiFlow, River, Git 2014 271

SEED(Huang et al., 2014) Link 2014 68

FHDDM(Pesaranghader and Viktor, 2016) Git 2016 134

RDDM(Barros et al., 2017) Git 2017 130

FTDD(de Lima Cabral and de Barros, 2018) Git (Fisher test) 2018 90

MDDM(Pesaranghader et al., 2018) Git 2018 55

KSWIN(Raab et al., 2020) MultiFlow 2020 58

then change into another concept in the data stream:

Concept

̸= Concept

<=> P

(x|y) ̸= P

(x|y) (3)

In addition to the drift types (Real vs. Virtual),

CD is characterized by its transition (i.e., speed of

change), which has often been categorized as Abrupt,

Gradual, and Incremental to express whether the

change levels are small or signiﬁcant. These aspects

carry essential information that can be utilized to de-

velop drift-handling mechanisms. Our study focuses

on the Abrupt and Gradual changes for Real CD:

Abrupt Drift. In Abrupt drifts, a known concept C

switches suddenly to another concept C

, and the pro-

gression of change is very rapid. This shift can hap-

pen for several reasons, such as the outage of service

or failure of a sensor and equipment.

Gradual Drift. The transition between concepts

happens slowly, following a Gradual progression of

tiny changes. For instance, a slowly degrading part of

factory equipment can result in a Gradual drift in the

quality of the output parts, or inﬂation through a pe-

riod of time can impact models dealing with pricing

(Tsymbal, 2004). (Gama et al., 2014) introduced ”in-

termediate concepts” that help to illustrate the transi-

tion between the old concept C

and the new concept

i+1

. The intermediate concept can be one of C

, which means each sample appearing during the

drift period belongs to one of the concepts involved.

3 RELATED WORKS

Since the mid-1990s, researchers have shown great

interest in CD (Widmer and Kubat, 1996; Klinken-

berg and Renz, 1998), and several methods for detect-

ing changes in the data stream have been developed.

Among these methods, error-based methods (super-

vised) have gained signiﬁcant attention. These meth-

ods utilize the predictive performance as input and

apply statistical distribution tests to capture any sig-

niﬁcant change in the learner’s performance. Thus,

any ﬂuctuation in the error rate can signal a drift.

These methods return the CD locations/timestamps in

the data stream. We explored numerous CD detectors

and report in Table 1, the most utilized ones. Table

1 presents various CD detection algorithms, includ-

ing their (1) implementation, (2) year of publication,

and (3) popularity based on the citation number. We

investigated many methods and reported their imple-

mentation using three primary sources: (1) GitHub

for the public repositories, (2) Link for the provided

link to the source code, and (3) Multiﬂow or River

for the implementation on Scikit-multiﬂow or River,

which are Python-based packages for ML for stream-

ing environments.

Examining various error-based CD detection

methods reveals a chronological progression in their

development. The Page-Hinkley (PH) method (Page,

1954) marked the beginning of this progression,

which has since evolved to include increasingly ad-

vanced approaches, such as DDM (Gama et al.,

2004), EDDM (Baena-Garcıa et al., 2006), and AD-

WIN (Bifet and Gavalda, 2007). While some tech-

niques have garnered signiﬁcant attention regarding

A Probabilistic Approach for Detecting Real Concept Drift

303

the citation count, others have remained less inﬂu-

ential. Nonetheless, the diverse landscape of error-

based methods offers researchers a wide array of tech-

niques to choose from based on their speciﬁc applica-

tion needs.

Recent trends in the ﬁeld of CD detection have

seen an increased interest in taking advantage of prob-

abilistic methods. For instance, the study (Parasteh

and Sadaoui, 2023) introduced a new supervised

probabilistic CD detection algorithm called SPNCD.

The latter utilizes the Sum-Product Network to learn

the joint probability distribution of incoming data in

a tractable way. More speciﬁcally, SPNCD leverages

the predicted probabilities from the SPN model and

combines them with the base ML model’s predictions

to effectively detect drifts (Real and Virtual). How-

ever, the SPNCD’s dependence on the SPN as an ad-

ditional model added computational demands.

In this paper, for comparison purposes, we choose

the following ﬁve methods:

• ADWIN (Adaptive Windowing): This is a detector

and estimator that efﬁciently adapts the length of

a window of observations to detect changes in the

observable process. The adaptation is based on

an online algorithm that maintains the statistical

properties of the data stream, allowing a prompt

reaction to changes. ADWIN is more efﬁcient for

Gradual drifts (Bifet and Gavalda, 2007).

• EDDM (Early Drift Detection Method): This

is a supervised detection method that monitors

the distribution of distances between consecutive

classiﬁcation errors. EDDM can detect Gradual

and Abrupt changes while maintaining low false

positive rates and is particularly designed to detect

early signs of drifts. (Baena-Garcıa et al., 2006).

• KSWIN (Kolmogorov-Smirnov Windowing): This

is a drift detection technique based on the

Kolmogorov-Smirnov statistical test. It compares

the distributions of two samples from a window

of recent observations and triggers alarms upon

signiﬁcant distribution changes, thereby indicat-

ing CD.

• HDDM (Hellinger Distance Drift Detection):

This detector measures the dissimilarity between

two probability distributions using the Hellinger

distance, aiming to detect changes in data streams.

The method offers two variations, HDDM A and

HDDM W, with the former more sensitive to

Abrupt changes and the latter designed to identify

Gradual changes (Frias-Blanco et al., 2014).

4 PROBABILISTIC REAL-DRIFT

DETECTION APPROACH

The new approach for detecting real drift operates

based on an adaptive probabilistic mechanism that

continuously monitors the incoming data stream,

capturing and reacting to drift. This mechanism

aligns with the formal deﬁnition of CD, which sig-

niﬁes changes in the posterior probability distribu-

tion, P(y|x), while the data distribution in the in-

put space,(P(x), remains consistent over time. The

approach, termed Probabilistic Real-Drift Detection

(PRDD), emphasizes two key aspects of a classiﬁer’s

performance: (1) prediction error and (2) conﬁdence

in its predictions. The underlying rationale is that dur-

ing real drift, a classiﬁer’s decision boundary may be-

come outdated, leading to an increase in prediction

errors. Thus, even with consistent input features, the

classiﬁer, conﬁdent in its predictions, may misclassify

drifted samples due to an irrelevant decision bound-

ary. This misclassiﬁcation arises when there’s a shift

in the target variable distribution, despite the input

distribution remaining unchanged.

While processing the data stream sample by sam-

ple (online learning), PRDD maintains a ﬁxed-size

moving window for calculating the drift detection fac-

tors, including an adaptive real-drift threshold (T

real

the real drift rate computed as the proportion of drift-

ing candidate within the moving window, and a com-

parison of the real drift rate with a drift alarm thresh-

old (T

alarm

). The pseudocode of the PRDD approach

is given in Algorithm 1. Below, we explain the PRDD

parameters and variables:

• T

real

∈ [0, 1]: T

real

serves as a threshold mark-

ing the learner’s conﬁdence in misclassiﬁed sam-

ples. Any sample misclassiﬁed with a conﬁ-

dence exceeding T

real

is ﬂagged as a potential

drift candidate. To dynamically adjust this thresh-

old in response to the most recent changes in the

data stream, we employ the Exponential Mov-

ing Average of the classiﬁer’s prediction proba-

bilities. This method strikes a balance between

the classiﬁer’s recent and historical prediction per-

formance. For every prediction error, the thresh-

old T

real

is recalibrated according to the formula

real

= 0.7×P

avg

+0.3 ×T

real

, where P

avg

denotes

the average prediction probability within the cur-

rent window. Such an adaptive approach ensures

that T

real

consistently mirrors the data stream’s

prevailing conditions.

• T

alarm

∈ [0, 1]: T

alarm

is a predeﬁned alarming

threshold utilized to ascertain if a drift is man-

ifesting within the moving window. Grounded

on our empirical assessments across 16 datasets,

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

304

alarm

is established at 0.47. This threshold as-

sists in pinpointing situations where the real drift

rate within the window exceeds a limit that ne-

cessitates the recognition of an actual CD. T

alarm

guarantees a balance, ensuring sensitivity to legit-

imate drifts while maintaining resilience against

incidental noise and minor variations.

• Warmup Threshold: This parameter is introduced

to allow the model to warm up or acclimate to

the recent state of the data stream. The warmup

threshold is been set empirically to let the model

observe sufﬁcient data and calibrate its parameters

without actively triggering any drift detection.

• Sample Count is used to count the number of pro-

cessed samples since the beginning of the new

concept. When the sample count passes the

warmup threshold, the detection mechanism is ac-

tivated, and When drift is detected, this variable

will be reset to 0.

• Window Size: The window size determines the

number of recent samples included in the calcu-

lations for T

real

and the drift rate. Given a ﬁxed

drift alarm threshold, such as 0.5, the window size

has a direct inﬂuence on the algorithm’s ability to

detect changes. A smaller window can make the

model overly sensitive, reacting to slight changes

or noise as if they were signiﬁcant drifts. Con-

versely, a larger window might result in slower

detection of rapid drifts. Therefore, selecting the

right window size is crucial to ensure timely drift

detection while avoiding false alarms caused by

noise.

In conclusion, the PRDD method presents a ro-

bust solution for real drift detection in streaming data.

It capitalizes on the classiﬁer’s prediction probabil-

ities and maintains a quick adaptation rate to CD,

thereby ensuring reliable performance in dynamic en-

vironments. Utilizing an adaptive learner’s conﬁ-

dence threshold (T

real

), a static drift alarm threshold

alarm

), a warm-up period, and an optimal window

size, PRDD forms a high-quality mechanism for real

drift detection.

5 VALIDATION

To validate the detection capability of our approach,

we conduct experiments on a comprehensive set

of synthetic stream datasets designed explicitly for

evaluating CD detection algorithms. These public

datasets were generated using the scikit-multiﬂow

framework to simulate the occurrence of various types

Algorithm 1: Real CD Detection Algorithm.

Require: dataStream (continuous), windowSize =

20, T

alarm

= 0.47, warmupThreshold = 20

Ensure: Drift detection and classiﬁer update

1: Initialize classiﬁer and window parameters

2: sampleCount = 0, T

real

= 0, realDriftCount = 0

3: for each sample in dataStream do

4: sampleCount +=1

5: (*Perform prequential prediction and train-

ing*)

6: Predict the label y

pred

for the current sample x

7: Calculate the probability P(y

pred

|x) associated

with the prediction

8: Update the classiﬁer using the true label y

9: if sampleCount > warmupThreshold then

10: Update window of probabilities with

P(y

pred

|x)

11: if label is incorrect and P(y

pred

|x) ≥ T

real

then

12: realDriftCount +=1

13: Update window of real drifts

14: end if

15: (*Calculate average probability*)

16: P

avg

windowSize

∑

P(y

pred

|x)

17: if label is incorrect then

18: (*Update T

real

19: T

real

= 0.7 × P

avg

+ 0.3 × T

real

20: end if

21: (*Calculate real drift rate*)

22: RD

rate

realDri f tCount

windowSize

23: if window is full and RD

rate

> T

alarm

then

24: Signal drift

25: Reset stats and re-initialize classiﬁer

26: sampleCount =0

27: end if

28: end if

29: end for

of drifts. As these datasets precisely mark the loca-

tions of the induced drifts, they serve as ground truth

for performance measurement of different CD detec-

tion metrics. In the following subsections, we de-

scribe the experimental setup, the synthetic datasets,

and the obtained performance results in detail.

5.1 Diverse Drift Datasets

To ensure the robustness of our ﬁndings, we em-

ploy 16 synthetic datasets from the publicly available

collection hosted by Harvard Dataverse(L

opez Lobo,

2020). These datasets are designed speciﬁcally for

CD detection research, encapsulating Abrupt and

Gradual drift scenarios. Each dataset consists of

40,000 observations with a balanced binary class dis-

A Probabilistic Approach for Detecting Real Concept Drift

305

tribution and devoid of any noise. Speciﬁcally, these

datasets manifest four distinct concepts separated by

three drifts at predetermined time steps. These tran-

sitions span over 1000 instances for Gradual drift

datasets, simulating a slow adaptation to the new con-

cepts. The datasets originated from four different

stream generators: Sine, Random Tree, Mixed, and

Stagger. Each generator contributes to Abrupt and

Gradual drifts, enhancing the diversity of our dataset

collection. The datasets bear unique characteristics

regarding drift types, the number of features, and un-

derlying generating functions.

• Mixed: Constructs datasets with four numerical

features and adopts two distinct function orders

deﬁned as F1 = [0, 1, 0, 1] and F2 = [1, 0, 1, 0].

• Sine: Generates datasets with two numerical fea-

tures using two function orders: F1 = [0, 1, 2, 3]

and F2 = [3, 2, 1, 0].

• Stagger: Produces datasets with three numerical

features based on two function orders: F1 = [0, 1,

2, 0] and F2 = [2, 1, 0, 2].

• Random Tree (RT): Creates datasets with two

numerical features using two function orders: F1

= [8873, 9856, 7896, 2563] and F2 = [2563, 7896,

9856, 8873].

Each generator produces four different datasets.

For example, the Mixed generator constructs

MixedF1Abrupt, MixedF2Abrupt, MixedF1Gradual

and MixedF2Gradual. Using this diverse range of

datasets, our evaluation thoroughly evaluates the

proposed algorithm across varying types of CD

scenarios.

5.2 Experiment Setup

Our evaluation strategy encompasses a compar-

ative study with ﬁve contemporary drift detec-

tion algorithms, namely ADWIN, EDDM, KSWIN,

HDDM A, and HDDM W. We benchmark these

models against our real drift detection algorithm on

the 16 synthetic datasets. Performance is quantiﬁed

using three key metrics: Accuracy, F1-score, and exe-

cution time, providing a comprehensive overview of

each model’s predictive capability, the balance be-

tween precision and recall, and computational efﬁ-

ciency. With regard to KSWIN, it possesses a de-

gree of nondeterminism stemming from its built-in

sampling process. To accommodate this variability,

we conduct a series of 10 independent runs for each

dataset when testing with KSWIN. The reported re-

sults for this method represent the average outcomes

of these multiple runs, offering a more reliable mea-

sure of its performance.

We adopt the Hoeffding Tree Classiﬁer as the un-

derlying learner for the six drift detection algorithms.

This classiﬁer, renowned for its adaptability to high-

speed data streams, is a decision tree designed specif-

ically to process data items arriving at fast rates. It

serves as an appropriate choice given the dynamic na-

ture of CD and the real-time processing requirements

of streaming data.

To better understand the contribution of the drift

detection component, we also include a baseline sce-

nario in our experimental setup. This scenario con-

sists solely of the Hoeffding Tree Classiﬁer, devoid

of any drift detection mechanism. This baseline al-

lows us to gauge the added value of integrating a

drift detector with the classiﬁer. While we expect that

combining a classiﬁer with a drift detector generally

outperforms a standalone classiﬁer, we focus on ana-

lyzing how effectively the proposed method enhances

the performance. The performance evaluation of the

seven algorithms is structured along four distinct seg-

ments:

1. Performance metrics on the eight Abrupt drift

datasets.

2. Performance metrics on the eight Gradual drift

datasets.

3. Aggregated performance analysis using rank-

based statistics.

4. Analysis of the average execution time to evaluate

the computational efﬁciency.

Our comprehensive approach to performance

evaluation sheds light on the algorithm’s behavior un-

der both Abrupt and Gradual drift scenarios and pro-

vides a comparative view of its performance against

other algorithms, including computational aspects.

The choice to employ a rank-based analysis alongside

traditional performance metrics stems from the de-

sire for a well-rounded assessment of the algorithm’s

capabilities. While direct metrics such as accuracy

or F1-score can offer valuable insights, they might

sometimes be swayed by the unique properties of in-

dividual datasets. For instance, an algorithm could

excel on speciﬁc datasets because those datasets in-

herently match the algorithm’s assumptions. More-

over, the variability in performance outcomes across

diverse datasets can make averaged comparisons less

statistically relevant. In contrast, a rank-based analy-

sis provides a better comparative perspective by mea-

suring how often our proposed algorithm outperforms

other algorithms, regardless of the absolute perfor-

mance metrics.

By exploring these different evaluation scenarios,

we aim to provide a comprehensive and robust as-

sessment of our proposed algorithm’s effectiveness

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

306

and utility across various drift dynamics and compu-

tational constraints.

5.3 Performance Evaluation on Abrupt

Drift

Our proposed real-drift detection algorithm under-

went an extensive comparative evaluation across eight

datasets that exhibit Abrupt drift. The evaluation was

set up to ensure a level playing ﬁeld where all par-

ticipating algorithms, including our proposed model,

were coupled with the Hoeffding Tree Classiﬁer.

5.3.1 Performance Across Abrupt Datasets

Table 2 offers a comparative analysis of several al-

gorithms, focusing on their Accuracy and F1-score

metrics across Abrupt drift datasets. At a glance,

our proposed PRDD algorithm emerges as a top

performer, leading in 6 out of the 8 datasets in-

cluding MixedF1, MixedF2, RTF2, SineF1, SineF2,

and StaggerF2. This consistent performance under-

scores its strength and reliability in real-drift detec-

tion. However, it faced challenges on the other two

datasets. On the RTF1 dataset, PRDD is closely ri-

valed by HDDM A. While PRDD posts an Accu-

racy of 80.15%, HDDM A slightly surpasses it with

80.34%. Also, The StaggerF1 dataset presents a more

pronounced deviation. PRDD’s Accuracy dips to

91.44%, placing it ﬁfth among the algorithms. Such

a result underscores that while PRDD is generally

robust, there exist scenarios where its assumptions

might not align perfectly with the dataset’s charac-

teristics. It’s also noteworthy that on this dataset,

EDDM shines brightest with an impressive Accu-

racy of 96.12%, higher than HDDM A which showed

higher competency with PRDD. Other methods like

HDDM W, KSWIN, EDDM and ADWIN showed

better performance compare to the base learner, with

no detection algorithm, HoeffdingTree. In many

cases, HoeffdingTree often showd signiﬁcantly lower

Accuracy compared to other methods which high-

lithed the importance of implying CD detection algo-

rithms.

Conclusively, while PRDD delivers dominant per-

formance in most settings, there are speciﬁc instances

requiring further investigation. The overall results,

combined with its computational efﬁciency, position

PRDD as a qualiﬁed choice. However, understanding

the nuances of each dataset and scenario will further

enhance its applicability and effectiveness.

5.3.2 Rank Statistics for Abrupt Drifts

In addition to the learner performance metrics, we

also analyzed the rank statistics of each algorithm’s

performance in terms of Accuracy and F1-score

across the Abrupt drift datasets. Rank-based eval-

uation can offer a different perspective, as it aggre-

gates model performance across multiple datasets and

illustrates algorithmic consistency and relative per-

formance across diverse datasets. The rank statis-

tics for Accuracy and F1-score across the Abrupt drift

datasets are shown in Table 3. For each model, the

mean rank and standard deviation (Std. Dev.) are

computed. The rank of an algorithm on a dataset is

determined by its position in the sorted list of algo-

rithm performances, with rank 1 being the best.

Our proposed algorithm stands out in this rank-

based evaluation, securing the top position with a

mean rank of 1.75 in terms of Accuracy and 1.69 for

F1-score-based ranking. The low standard deviations

of 1.39 for both metrics further emphasize PRDD’s

consistently high performance across the board. In

the ranking hierarchy, HDDM A and HDDM W trail

closely with mean ranks of 2.13 and 3.25, respec-

tively. Following them are KSWIN and EDDM. Ho-

effdingTree, on the other hand, consistently ranks at

the bottom with a mean rank of 7 for both metrics,

highlighting the evident advantage of specialized drift

detection techniques over more generic methods in

stream learning scenarios.

Overall, these results highlight the robustness and

superior performance of our proposed real-drift detec-

tion algorithm in handling Abrupt CDs. The ensuing

sections will further explore the algorithm’s perfor-

mance on Gradual drift datasets and its computational

efﬁciency.

5.4 Performance Evaluation on

Gradual Drift

Within this section, we delve into the comparative

analysis of algorithmic performance on datasets char-

acterized by gradual drifts, which is a complex and

subtle challenge in the domain of data stream learn-

ing.

5.4.1 Performance on Gradual Drifts

The performance of the algorithms, when confronted

with Gradual real drifts, is a crucial aspect to con-

sider, given the subtlety of these drifts and the con-

sequent difﬁculty in their detection. To that end, we

assessed the seven algorithms across the eight Grad-

ual drift datasets, with respect to Accuracy and F1-

A Probabilistic Approach for Detecting Real Concept Drift

307

Table 2: Model Performance Metrics (Accuracy and F1-score) Across Abrupt Drift Datasets.

PRDD ADWIN KSWIN EDDM HDDM W HDDM A HoeffdingTree

Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1

MixedF1 91.19 91.20 89.51 89.54 90.65 90.65 90.43 90.43 91.03 91.03 91.18 91.19 83.35 84.37

MixedF2 91.22 91.22 89.00 89.00 90.82 90.83 89.91 89.94 91.18 91.17 90.94 90.96 80.84 79.30

RTF1 80.15 77.97 77.44 75.09 79.85 77.76 76.67 74.54 78.15 75.65 80.34 78.56 73.46 72.05

RTF2 82.70 81.10 78.45 75.77 80.18 78.28 75.68 73.46 79.30 77.38 81.79 79.89 72.16 71.09

SineF1 93.04 93.05 91.96 91.96 92.95 92.94 90.58 90.55 93.00 93.00 93.05 93.05 63.85 59.68

SineF2 93.19 93.19 91.31 91.34 92.90 92.90 89.93 90.01 93.17 93.15 92.94 92.95 56.29 58.39

StaggerF1 91.44 91.78 93.19 93.50 91.13 91.56 96.12 96.23 92.02 92.56 94.34 94.59 88.82 89.47

StaggerF2 97.70 97.73 94.95 94.99 92.65 93.10 97.32 97.35 95.36 95.52 96.44 96.51 92.62 92.57

Table 3: Rank Statistics for Accuracy and F1-score Across

Abrupt Drift Datasets.

Accuracy Rank F1-score Rank

Model Mean Std. Dev. Mean Std. Dev.

HoeffdingTree 7.00 0.00 7.00 0.00

ADWIN 5.00 0.93 5.00 0.93

EDDM 4.63 2.00 4.63 2.00

KSWIN 4.25 1.17 4.25 1.17

HDDM W 3.25 0.89 3.25 0.89

HDDM A 2.13 0.84 2.19 0.75

PRDD 1.75 1.39 1.69 1.39

score metrics compiled in Table 4. Our proposed al-

gorithm maintained its superior performance, again

dominating in 6 out of 8 datasets including, MixedF1,

MixedF2, RTF2, SineF1, SineF2, and StaggerF2.

These results provide compelling evidence of the pro-

posed model’s proﬁciency at detecting and respond-

ing to Gradual real drifts.

While PRDD exhibited superior performance

across most datasets, there were instances when other

algorithms surged ahead. For instance, in the RTF1

dataset, HDDM A marginally surpassed PRDD in

both Accuracy and F1-score. In the StaggerF1

dataset, although HDDM A achieved the highest ac-

curacy and F1-score, it trailed behind PRDD. As ob-

served in the abrupt scenario, no single algorithm con-

sistently dominates in all scenarios, underscoring the

potential for future research to delve into algorithmic

intricacies and potential areas of enhancement.

HDDM W, on the other hand, delivered com-

mendable results, securing a solid third position.

However, the performance of other methods such as

EDDM, ADWIN, and KSWIN was more variable on

Gradual datasets. As anticipated, the model without a

detector lagged signiﬁcantly behind its counterparts.

To sum up, our proposed PRDD method demon-

strated remarkable adeptness at managing Gradual

drifts. Yet, the nuanced variations in performance

across different datasets emphasize the signiﬁcance

of tailoring algorithm selection to the speciﬁc dataset

in question. Subsequent sections will provide a more

detailed, rank-based comparative analysis to further

illuminate these observations.

5.4.2 Rank Statistics for Gradual Drifts

In our efforts to offer a consolidated perspective on

the performance of each algorithm under Gradual real

drifts, rank statistics were computed across the eight

Gradual drift datasets. Rankings were ascertained

based on Accuracy and F1-score, where a lower rank

implies enhanced performance. The ﬁndings are en-

capsulated in Table 5.

In alignment with our previous observations,

PRDD, our proposed model ﬁrmly holds the top po-

sition. It secures an admirable mean rank of 1.63

(with a standard deviation of 1.07) for both Accu-

racy and F1-score metrics. These data further solidify

PRDD’s prowess in detecting and adeptly handling

Gradual real drifts. The second-best in accuracy is

HDDM A, which registers a mean rank of 2.13. Fol-

lowing closely, HDDM W clinches the third spot with

an average rank of 2.75 for Accuracy. The other al-

gorithms, EDDM, ADWIN, and KSWIN, display a

more varied rank distribution, echoing their incon-

sistent performance across different datasets. Pre-

dictably, the HoeffdingTree model, devoid of any de-

tector, languishes at the bottom with a mean rank of 7

for both Accuracy and F1-score.

The results of our rank-based analysis align with

and underscore our previous discussions, emphasiz-

ing the effectiveness and robustness of PRDD against

gradual real drifts in data streams. Figure 1 depicts the

rank distribution for each method in the 16 datasets.

As highlighted, PRDD consistently achieves the top

rank, closely followed by HDDM A.

5.5 Execution Time Analysis

The processing speed and efﬁciency of a real-drift de-

tection algorithm are fundamental, equating in im-

portance to its predictive accuracy. Swiftly address-

ing and adjusting to ongoing data stream alterations

are essential features of a leading CD detection algo-

rithm.

Figure 2 reveals that our proposed model consis-

tently demonstrates computational efﬁciency, clock-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

308

Table 4: Model Performance Metrics (Accuracy and F1-score) Across Gradual Drift Datasets.

PRDD ADWIN KSWIN HDDM W EDDM HDDM A HoeffdingTree

Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1

MixedF1 88.92 88.90 86.42 86.43 86.35 86.41 88.14 88.17 87.97 87.97 88.61 88.61 83.53 83.96

MixedF2 88.79 88.83 86.53 86.61 88.01 88.06 88.12 88.09 87.90 87.94 88.37 88.40 81.94 81.41

RTF1 79.33 76.61 77.18 74.25 76.01 74.34 78.61 75.97 78.43 75.89 79.35 76.80 72.11 70.60

RTF2 80.19 77.98 78.02 74.76 76.79 75.05 79.04 76.52 76.35 74.59 79.39 76.34 71.45 71.65

SineF1 90.38 90.34 88.97 88.91 77.82 77.21 90.33 90.26 88.32 88.33 90.03 90.01 64.68 60.06

SineF2 91.00 91.01 88.70 88.69 74.72 75.35 90.51 90.52 89.57 89.60 90.49 90.50 58.40 58.90

StaggerF1 93.13 93.34 91.55 92.04 92.97 93.22 94.68 94.83 94.46 94.63 94.78 94.96 87.13 88.14

StaggerF2 97.66 97.67 92.01 92.34 84.88 83.75 96.86 96.87 95.29 95.37 96.93 96.93 91.08 91.15

Table 5: Rank Statistics for Accuracy and F1-score Across

Gradual Drift Datasets.

Accuracy Rank F1-score Rank

Model Mean Std. Dev. Mean Std. Dev.

HoeffdingTree 7.00 0.00 7.00 0.00

KSWIN 5.13 0.84 5.38 0.75

ADWIN 4.88 1.73 4.63 1.67

EDDM 4.5 0.92 4.50 0.92

HDDM W 2.75 0.71 2.63 0.74

HDDM A 2.13 0.83 2.25 0.89

PRDD 1.63 1.07 1.63 1.07

PRDD Adwin KSWIN HDDM_W

models

HDDM_A EDDM

Hoeﬀding Tree

F1- Rank

Figure 1: Ranking distribution of algorithms across multi-

ple datasets.

ing an average execution time of approximately 5 sec-

onds across the tested datasets. This level of efﬁ-

ciency is maintained even when compared to a base-

line model devoid of a drift detection feature. The

precisely built design of our model, with a O(1) time

complexity per data point, is essential to its resilient

efﬁciency. PRDD employs a moving window to cal-

culate essential metrics such as mean probabilities

and drift ratios. A naive approach would require re-

calculating these measures for every point within the

window with each new data addition, incurring sig-

niﬁcant computing costs.

In contrast, our model includes an optimization in

the form of a ”running sum” technique. The opera-

tional dynamics of the ”running sum” technique are

as follows:

• Data Ingestion: As new data points are received,

they are immediately incorporated into the run-

ning sum, ensuring real-time updates.

• Window Saturation: Once the moving window

reaches its capacity, for every subsequent data

point, the model seamlessly updates the running

sum. This is achieved by subtracting the value of

the oldest data point (the one that exits the win-

dow) and adding the value of the incoming data

point.

This methodological approach eliminates the need

for recalculating the sum for the entire window with

each incoming data point, considerably mitigating

computational overhead. Such an optimized mech-

anism not only assures prompt updates but also po-

sitions our model as a standout choice for real-time

applications necessitating immediate responsiveness.

Additionally, updating base learners, such as the

HoeffdingTree, to accommodate data changes can be

computationally burdensome, especially in the face

of drift. Detecting drift early and swiftly adapting

to the emerging data patterns is, therefore, paramount

to achieving improved computational efﬁciency. Our

model’s ability to proactively detect and manage

drifts is a signiﬁcant advantage. It cuts down the high

computational costs that come with constantly adjust-

ing to evolving data. Since our model has an O(1)

time complexity it responds quickly to changes and

maintains consistent performance. This efﬁciency is

retained regardless of how large the incoming data

stream becomes.

A comparative analysis with benchmark models

is also instructive. For instance, models like HDDM

and ADWIN, despite their constant time complexity,

The EDDM model clocks in at about 7 seconds on

average. In contrast, the KSWIN model, bearing a

time complexity of O(w) where w is the window size,

A Probabilistic Approach for Detecting Real Concept Drift

309

Figure 2: Execution time of models on all the datasets.

records an average execution time of nearly 10 sec-

onds, almost twice that of our proposition.

The differences in these execution times, when all

models use a consistent base learner, offer a clear

efﬁciency contrast among the drift detection algo-

rithms. This comparison underscores our method’s

dual strength in both prediction and computation.

Consequently, our proposed model stands out as

an optimal choice for handling high-velocity data

streams.

6 CONCLUSION AND FUTURE

WORK

Predictive models based on historical patterns are sus-

ceptible to performance degradation in non-stationary

environments where the underlying data distributions

shift over time. Therefore, devising algorithms that

can effectively capture and adapt to Concept Drift

(CD) is crucial. Our proposed Probabilistic Real-Drift

Detection (PRDD) algorithm demonstrates excellent

performance in identifying real CD with high sen-

sitivity, rendering it a practical and reliable tool for

real-world data-stream applications. The PRDD’s ro-

bustness and adaptability are further evidenced by its

consistent performance under various drift dynamics,

including Gradual drifts.

Future work presents numerous research direc-

tions. Firstly, we plan to investigate CD in real-world

applications, an area that currently lacks sufﬁcient ex-

ploration in the literature. Speciﬁcally, we will focus

on credit card fraud, an ever-evolving ﬁeld. Our aim

is to understand the nature and characteristics of CD

in this application, considering that CD can occur in

both normal data (changes in users’ spending habits

or e-payment channels) and fraud data (fraudsters up-

dating their strategies in response to new technolo-

gies). Such insights will be invaluable in devising

even more effective predictive models to tackle CD.

Also, we aim to compare our active adaptive learn-

ing method to the passive learning method (Sadreddin

and Sadaoui, 2022).

REFERENCES

Baena-Garcıa, M., del Campo-

Avila, J., Fidalgo, R., Bifet,

A., Gavalda, R., and Morales-Bueno, R. (2006). Early

drift detection method. In Fourth international work-

shop on knowledge discovery from data streams, vol-

ume 6, pages 77–86.

Barros, R. S., Cabral, D. R., Gonc¸alves Jr, P. M., and Santos,

S. G. (2017). Rddm: Reactive drift detection method.

Expert Systems with Applications, 90:344–355.

Barros, R. S. M. and Santos, S. G. T. C. (2018). A large-

scale comparison of concept drift detectors. Informa-

tion Sciences, 451:348–370.

Bifet, A. and Gavalda, R. (2007). Learning from time-

changing data with adaptive windowing. In Proceed-

ings of the 2007 SIAM international conference on

data mining, pages 443–448. SIAM.

de Lima Cabral, D. R. and de Barros, R. S. M. (2018). Con-

cept drift detection based on ﬁsher’s exact test. Infor-

mation Sciences, 442:220–234.

Frias-Blanco, I., del Campo-

Avila, J., Ramos-Jimenez, G.,

Morales-Bueno, R., Ortiz-Diaz, A., and Caballero-

Mota, Y. (2014). Online and non-parametric drift de-

tection methods based on hoeffding’s bounds. IEEE

Transactions on Knowledge and Data Engineering,

27(3):810–823.

Gama, J. and Castillo, G. (2006). Learning with local

drift detection. In Advanced Data Mining and Ap-

plications: Second International Conference, ADMA

2006, Xi’an, China, August 14-16, 2006 Proceedings

2, pages 42–55. Springer.

Gama, J., Medas, P., Castillo, G., and Rodrigues, P. (2004).

Learning with drift detection. In Brazilian symposium

on artiﬁcial intelligence, pages 286–295. Springer.

Gama, J.,

Zliobait

e, I., Bifet, A., Pechenizkiy, M., and

Bouchachia, A. (2014). A survey on concept

drift adaptation. ACM computing surveys (CSUR),

46(4):1–37.

Gonc¸alves Jr, P. M., de Carvalho Santos, S. G., Barros,

R. S., and Vieira, D. C. (2014). A comparative study

on concept drift detectors. Expert Systems with Appli-

cations, 41(18):8144–8156.

Hoens, T. R., Polikar, R., and Chawla, N. V. (2012). Learn-

ing from streaming data with concept drift and imbal-

ance: an overview. Progress in Artiﬁcial Intelligence,

1:89–101.

Huang, D. T. J., Koh, Y. S., Dobbie, G., and Pears, R.

(2014). Detecting volatility shift in data streams. In

2014 IEEE International Conference on Data Mining,

pages 863–868. IEEE.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

310

Klinkenberg, R. and Renz, I. (1998). Adaptive information

ﬁltering: Learning in the presence of concept drifts.

Learning for text categorization, pages 33–40.

opez Lobo, J. (2020). Synthetic datasets for concept drift

detection purposes. Harv. Dataverse.

Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang,

G. (2018). Learning under concept drift: A review.

IEEE transactions on knowledge and data engineer-

ing, 31(12):2346–2363.

Nishida, K. and Yamauchi, K. (2007). Detecting concept

drift using statistical testing. In International confer-

ence on discovery science, pages 264–269. Springer.

Page, E. S. (1954). Continuous inspection schemes.

Biometrika, 41(1/2):100–115.

Parasteh, S. and Sadaoui, S. (2023). A Novel Probabilistic

Approach for Detecting Concept Drift in Streaming

Data. In International Conference on Deep Learn-

ing Theory and Applications, Delta, Springer Nature,

pages 173–188.

Pears, R., Sakthithasan, S., and Koh, Y. S. (2014). Detect-

ing concept change in dynamic data streams. Machine

Learning, 97(3):259–293.

Pesaranghader, A. and Viktor, H. L. (2016). Fast hoeffd-

ing drift detection method for evolving data streams.

In Joint European conference on machine learning

and knowledge discovery in databases, pages 96–111.

Springer.

Pesaranghader, A., Viktor, H. L., and Paquet, E. (2018).

Mcdiarmid drift detection methods for evolving data

streams. In 2018 International Joint Conference on

Neural Networks (IJCNN), pages 1–9. IEEE.

Raab, C., Heusinger, M., and Schleif, F.-M. (2020). Re-

active soft prototype computing for concept drift

streams. Neurocomputing, 416:340–351.

Ross, G. J., Adams, N. M., Tasoulis, D. K., and Hand,

D. J. (2012a). Exponentially weighted moving aver-

age charts for detecting concept drift. Pattern recog-

nition letters, 33(2):191–198.

Ross, G. J., Adams, N. M., Tasoulis, D. K., and Hand,

D. J. (2012b). Exponentially weighted moving aver-

age charts for detecting concept drift. Pattern recog-

nition letters, 33(2):191–198.

Sadreddin, A. and Sadaoui, S. (2022). Chunk-based in-

cremental feature learning for credit-card fraud data

stream. Journal of Experimental & Theoretical Artiﬁ-

cial Intelligence, 0(0):1–19.

Tsymbal, A. (2004). The problem of concept drift: deﬁ-

nitions and related work. Computer Science Depart-

ment, Trinity College Dublin, 106(2):58.

Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., and Petit-

jean, F. (2016). Characterizing concept drift. Data

Mining and Knowledge Discovery, 30(4):964–994.

Webb, G. I., Lee, L. K., Petitjean, F., and Goethals, B.

(2017). Understanding concept drift. arXiv preprint

arXiv:1704.00362.

Widmer, G. and Kubat, M. (1996). Learning in the pres-

ence of concept drift and hidden contexts. Machine

learning, 23(1):69–101.

A Probabilistic Approach for Detecting Real Concept Drift

311