Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines
Otmane Azeroual
1,2 a
1
German Centre for Higher Education Research and Science Studies (DZHW), 10117 Berlin, Germany
2
University of Hagen, 58097 Hagen, Germany
Keywords: Artificial Intelligence (AI), Explainable AI (XAI), Self-Adaptive Data Pipelines, Real-Time Streaming,
Concept Drift, Trustworthy AI.
Abstract: The increasing use of real-time data streams in application areas such as the Internet of Things (IoT), financial
analytics, and social media demands highly flexible and self-adaptive data pipelines. Modern AI techniques
enable the automatic adjustment of these pipelines to dynamically changing data landscapes; however, their
decision-making processes often remain opaque and difficult to interpret. This paper presents and evaluates
novel approaches for integrating Explainable Artificial Intelligence (XAI) into self-adaptive real-time data
pipelines. The goal is to ensure transparent and interpretable data processing while meeting the requirements
of real-time capability and scalability. The proposed methods aim to strengthen trust in automated systems
and simultaneously address regulatory demands. Initial experimental results demonstrate promising
improvements in both explainability and adaptivity without significant performance degradation.
1 INTRODUCTION
The rapid increase in data—especially in the form of
real-time data streams—is increasingly shaping a
wide range of industries and applications. With the
emergence of technologies such as the Internet of
Things (IoT), social media platforms, and high-
frequency financial trading, vast volumes of data are
being continuously and rapidly generated (Cacciarelli
& Kulahci, 2024). The analysis and processing of this
data in real time is essential for enabling swift
decision-making and automation across various
domains, from industrial manufacturing to
cybersecurity (Zaharia et al., 2016). In this context,
self-adaptive data pipelines are gaining growing
importance, as they are capable of dynamically
responding to changing conditions and adjusting the
data flow accordingly (Sresth et al., 2023).
However, while many systems are able to
autonomously adapt to changing data streams, they
often lack the ability to make these adaptations
transparent and understandable. This raises the
central research question of this work: How can
explainability be integrated into self-adaptive data
pipelines without compromising real-time
capabilities?
a
https://orcid.org/0000-0002-5225-389X
Self-adaptive data pipelines are systems that not
only continuously ingest and process data but also
autonomously adjust their internal structure,
parameters, and algorithms in response to changing
data conditions or environmental factors (Zaharia et
al., 2016). This is particularly relevant when dealing
with so-called concept drift effects, where the
underlying data distribution changes over time (Gama
et al., 2014). Such changes can significantly impair
model performance if not detected and addressed
promptly. Traditional, static pipelines that operate
without adaptive mechanisms are at a disadvantage in
such scenarios and may quickly produce outdated or
incorrect results (Krawczyk, 2016).
The integration of artificial intelligence (AI)—
and especially machine learning methods—into these
adaptive systems makes it possible to manage the
complexity and dynamics of streaming data. AI
models can detect patterns, make predictions, and
automatically implement adjustments to optimize
data processing (Gomes et al., 2023). This automation
enhances not only the efficiency and accuracy but
also the scalability of data pipelines in real-time
environments. In addition, self-learning algorithms
allow systems to proactively respond to new data
428
Azeroual, O.
Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines.
DOI: 10.5220/0013736100004000
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2025) - Volume 2: KEOD and KMIS, pages
428-437
ISBN: 978-989-758-769-6; ISSN: 2184-3228
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
characteristics without requiring human intervention
(Gama et al., 2014).
Despite the obvious advantages of AI-powered
adaptive pipelines, significant challenges remain—
particularly regarding the transparency and
interpretability of their automatic adjustments
(Azeroual, 2024). Modern AI models are often
perceived as black-box systems, with internal
decision-making processes that are difficult to
understand and explain (Adadi & Berrada, 2018).
This lack of transparency hampers error diagnosis,
debugging, and user acceptance by both end-users
and decision-makers (Doshi-Velez & Kim, 2017).
Moreover, regulatory requirements—such as those
mandated by the General Data Protection Regulation
(GDPR)—are becoming increasingly stringent,
making explainable and traceable data processing
essential (Rudin, 2019).
Against this backdrop, the research field of
Explainable Artificial Intelligence (XAI) has
emerged, aiming to design models whose decisions
and adjustments are understandable and interpretable
by humans (Arrieta et al., 2020). However, many
existing XAI methods focus on static, batch-based
models and fail to consider the specific requirements
of real-time streaming and adaptive systems (Guidotti
et al., 2018). In real-time environments, explanations
must be delivered quickly, dynamically, and
contextually, supporting users in interpreting model
behavior without compromising system latency or
efficiency (Abbas & Eldred, 2025).
The integration of XAI into self-adaptive real-
time data pipelines thus represents a timely and
critical research challenge. The objective is to
develop approaches that not only maintain the
adaptability of pipelines but also provide transparent
and comprehensible explanations for their dynamic
adjustments. Key challenges include minimizing
latency, ensuring scalability, and handling
continuously changing data contexts (Ribeiro et al.,
2016). Addressing these aspects is crucial for
building user trust and enabling the deployment of
AI-based systems in safety-critical and regulated
application domains.
This paper presents a novel system architecture
that integrates XAI into self-adaptive real-time data
pipelines. The goal is to combine adaptivity and
explainability to enable transparent, interpretable,
and high-performing data processing. The modular
pipeline handles concept drift and is evaluated
through empirical experiments and user feedback,
aiming to strengthen trust in AI while meeting
practical and regulatory requirements.
2 THEORETICAL
BACKGROUND/
FOUNDATIONS
2.1 Data Pipelines: Structure, Function,
and Challenges in Real-Time
Streaming
Data pipelines are structured sequences of processing
steps that guide data from acquisition through
transformation to analysis (Hashem et al., 2015). In
modern applications, these pipelines often need to
handle large volumes of data streams in real time,
which imposes specific requirements on latency,
scalability, and fault tolerance (Zaharia et al., 2016).
Real-time streaming pipelines enable the continuous
processing of data in motion, for example through
frameworks such as Apache Kafka, Apache Flink, or
Apache Spark Streaming (Kreps et al., 2011; Carbone
et al., 2015). A central challenge lies in ensuring data
consistency and quality despite high data rates and
potential failures (Liu et al., 2021).
2.2 Self-Adaptivity: Concepts and
Mechanisms
Self-adaptivity refers to a system's ability to
autonomously adjust its behavior to changing
environmental conditions without human
intervention (Salehie & Tahvildari, 2009). In data-
driven pipelines, this is especially relevant when data
distributions change—a phenomenon known as
concept drift (Gama et al., 2014). Concept drift
describes the temporal shift in the underlying data
distribution, which can lead to performance
degradation of static models (Widmer & Kubat,
1996). Adaptive mechanisms such as online learning
or model retraining are used to detect and compensate
for such changes (Krawczyk, 2016). Automated
machine learning (AutoML) supports these processes
by autonomously optimizing model parameters and
generating new models (He et al., 2021).
2.3 Artificial Intelligence (AI) and
Machine Learning (ML) in Adaptive
Systems
AI, particularly ML, enables adaptive systems to
learn from data and dynamically improve decision-
making (Russell & Norvig, 2016). In real-time
streaming scenarios, online learning methods are
frequently applied to incrementally update models
and reflect current data (Bifet et al., 2018). Concept
Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines
429
drift detection techniques are essential to maintain
model accuracy over time (Lu et al., 2018).
Combinations of supervised, unsupervised, and
reinforcement learning methods are employed to
address various challenges in self-adaptive systems
(Mohammadi et al., 2018).
However, the integration of ML into adaptive
systems also increases complexity, making it harder
for users to understand why specific adaptations
occur—especially in real-time environments. This
highlights the need for mechanisms that make such
dynamic decisions transparent, which is where XAI
becomes crucial.
2.4 Explainable AI (XAI): Definitions,
Methods, Limitations
XAI refers to methods and techniques that make the
decisions of AI systems understandable and
transparent to humans (Doshi-Velez & Kim, 2017).
The goal is to enhance trust in AI, especially in safety-
critical applications (Arrieta et al., 2020). XAI
methods can be categorized into intrinsic models
(e.g., decision trees) and post-hoc explanations (e.g.,
LIME, SHAP) (Ribeiro et al., 2016; Lundberg & Lee,
2017). Despite recent advances, limitations remain
regarding scalability, interpretability, and
applicability to complex, dynamic systems (Adadi &
Berrada, 2018). Furthermore, ensuring explanation
stability over time and mitigating the risk of
misleading or inconsistent explanations in self-
adaptive models remain open research questions.
2.5 Specific Challenges of XAI in
Real-Time and Streaming Contexts
The application of XAI in real-time streaming and
self-adaptive systems imposes additional
requirements: explanations must be provided with
low latency, continuously updated, and adapted to
changing data contexts (Abbas & Eldred, 2025). This
places high demands on the efficiency of explanation
methods and their ability to interpret dynamic models
(Guidotti et al., 2018). Research shows that many
established XAI techniques cannot be directly applied
to real-time data streams, as they are often batch-
oriented and computationally intensive (Molnar,
2020). New approaches aim to develop adaptive,
lightweight, and context-sensitive explanations for
streaming data (Lundberg et al., 2020).
The need for lightweight, adaptive explanation
mechanisms in streaming contexts underscores the
research gap this work aims to address—namely, the
lack of integrated solutions that combine real-time
adaptivity with explainability in a coherent, scalable
system.
3 STATE OF RESEARCH
The rapid development of data-driven systems and
the increasing importance of real-time streaming data
have led to intensified research on adaptive data
pipelines in recent years. These pipelines are
designed to autonomously adjust to changing data
environments in order to continuously deliver
accurate and reliable results (Gama et al., 2014; Kiran
et al., 2021). In particular, the challenge of concept
drift—i.e., the temporal change in the underlying data
distribution—requires flexible and adaptive
approaches capable of continuously updating models
and responding to new conditions (Widmer & Kubat,
1996; Krawczyk, 2017). The integration of AI and
ML plays a central role in enabling automated
decision-making within the pipeline and ensuring
autonomous adaptation to shifting data streams (He et
al., 2021; Bifet et al., 2018).
Many existing adaptive systems rely on online
learning techniques, which allow models to be
incrementally trained with new data, thereby
maintaining model performance during live operation
(Lu et al., 2018). Additionally, AutoML techniques
are integrated to automate the processes of model
selection and optimization, reducing the need for
human intervention (He et al., 2021). Despite these
advances, most studies focus on individual aspects of
the pipeline—such as model updating or data
preprocessing—and tend to neglect a holistic
perspective that includes the explainability and
transparency of automatic decisions (Kiran et al.,
2021).
XAI is a growing research field aimed at
increasing trust in AI systems, particularly in safety-
critical and regulated application areas (Doshi-Velez
& Kim, 2017; Arrieta et al., 2020). XAI encompasses
methods that make the internal decision processes of
often complex black-box models comprehensible
(Molnar, 2020). Broadly speaking, XAI methods can
be divided into intrinsically interpretable models and
post-hoc explanation techniques (Ribeiro et al., 2016;
Lundberg & Lee, 2017). While intrinsic models such
as decision trees or linear regression are transparent
by design, post-hoc approaches like LIME or SHAP
provide explanations for arbitrary models without
modifying the underlying architecture (Ribeiro et al.,
2016; Lundberg & Lee, 2017).
However, most established XAI methods have
been developed for static, batch-oriented data
KMIS 2025 - 17th International Conference on Knowledge Management and Information Systems
430
environments. Real-time streaming contexts pose
new challenges: data distributions may change
dynamically, models need to be continuously
adapted, and explanations must also be delivered
rapidly and adaptively to ensure transparency in
decision-making at all times. Providing explanations
in real time further requires resource-efficient
algorithms that are compatible with high data
throughput and low-latency requirements (Guidotti et
al., 2018). Current research efforts therefore focus on
developing incremental and lightweight XAI
methods specifically tailored for streaming data
(Lundberg & Lee, 2017). However, these approaches
are still in an early stage and often address only partial
aspects—such as the explanation of individual
predictions—without fully covering the complex
adaptation mechanisms of entire pipelines.
An analysis of current research clearly reveals a
significant gap: there are very few integrated
approaches that combine self-adaptive data pipelines
with XAI methods to ensure continuous transparency
and traceability in real-time streaming environments
(Arrieta et al., 2020; Zhou et al., 2022). Most studies
focus either on the adaptivity of the pipeline or the
explainability of individual models, but not on the
combination of both aspects in a dynamic, stream-
based context. What is missing is an end-to-end
approach that simultaneously ensures (1) continuous
model adaptation, (2) timely and relevant
explanations, and (3) seamless integration into real-
time data pipelines.
Table 1: Classification of Existing Approach.
Table 1 provides an overview of selected relevant
research works, classifying them according to their
approaches to adaptive pipelines, AI integration,
application of XAI methods, and real-time
capabilities.
This overview illustrates that while substantial
work exists on adaptive pipelines and XAI models
individually, the fusion of both domains in real-time
environments remains largely unexplored. This
research gap represents a critical barrier to the
acceptance and broader deployment of automated,
self-adaptive systems, as transparency and
traceability are essential prerequisites for trust and
compliance (Arrieta et al., 2020).
The present paper addresses this intersection and
aims to develop novel approaches that seamlessly
integrate XAI methods into adaptive real-time data
pipelines to ensure both high performance and
transparency.
4 METHODOLOGY / CONCEPT
DEVELOPMENT
This paper focuses on the development of an
innovative approach for implementing explainable,
self-adaptive real-time data pipelines, aiming to close
existing research gaps at the intersection of adaptivity
and explainability. The methodology is based on the
design and implementation of a modular,
dynamically adjustable system capable of
continuously and autonomously responding to
changes in incoming data streams, while
simultaneously providing understandable
explanations for its decisions and adaptations at any
time. The concept is designed to enhance
transparency and traceability of self-adaptive
processes without compromising key real-time
requirements such as latency and performance.
The proposed approach relies on a tightly
integrated combination of advanced ML techniques
and explainable AI (XAI) technologies. At its core,
the system uses online learning algorithms that
continuously update model parameters based on new
incoming data, maintaining model accuracy even as
data distributions shift. These methods are
particularly well-suited for reacting to concept drift
i.e., changes in the statistical structure of the data—
that would otherwise significantly degrade model
performance without automatic adaptation. In
addition, AutoML techniques are integrated to
automate the selection and optimization of model
hyperparameters, enabling the pipeline to operate
largely autonomously. This reduces the need for
human intervention and allows for flexible and
scalable model maintenance in productive real-time
environments.
LIME and SHAP were selected due to their wide
acceptance, ability to generate local feature
attributions, and extensibility. Despite their original
design for batch scenarios, we extend them to operate
in a streaming context by implementing window-
based updates and approximation strategies.
Parallel to adaptive modeling, the integration of
XAI methods is a central element of the design.
Resource-efficient, incremental explanation
approaches are developed, tailored to continuous data
Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines
431
streams, enabling timely generation of explanations.
In contrast to traditional post-hoc explanation
methods, which are often computationally intensive
and designed for static datasets, this methodology
supports ongoing explanation generation that reflects
the dynamics of the data stream and transparently
illustrates changes in decision-making logic. For
instance, feature importance scores and local
surrogate models are incrementally updated to
provide fast yet precise and context-sensitive insights
into the behavior of adaptive models. Explanations
are hierarchically structured to offer different levels
of detail suited to various stakeholders—from
technical experts to domain users—thereby
enhancing both comprehensibility and relevance.
The technical realization of the approach is based
on a modular system architecture that integrates
components for data ingestion, preprocessing, model
training and updating, explanation generation, and
monitoring/control. For data ingestion and
processing, established streaming frameworks such
as Apache Kafka or Apache Flink are employed,
ensuring high throughput and low latency as a robust
foundation for real-time operations. The adaptive
modeling module implements both online learning
algorithms and AutoML components, which
automatically determine suitable model
configurations and seamlessly integrate them into
operation. In parallel, a dedicated explanation module
operates as a lightweight microservice, tightly
coupled with the adaptive models to access relevant
contextual information required for explanation
generation. The monitoring component continuously
evaluates data distribution, model quality, and
explanation performance, controlling the adaptive
pipeline by triggering model updates or issuing alerts
in case of potential misadaptations. This establishes a
closed feedback loop that ensures both the automation
and transparency of data processing.
The modular architecture of this system is
illustrated in Figure 1. It highlights the close
integration of individual components—from data
ingestion and adaptive modeling to explanation
generation and pipeline monitoring/control. The
diagram particularly emphasizes the closed control
loop that enables continuous self-adjustment of the
models and the ongoing generation and provision of
explanations in real time. This forms a critical
foundation for ensuring high adaptability and
comprehensive transparency in dynamic, data-
intensive environments, while also facilitating
scalability, fault tolerance, and ease of
maintenance—key requirements for deploying
robust, trustworthy AI solutions in real-world
streaming applications.
Figure 1: Modular architecture of the proposed explainable,
self-adaptive real-time data pipeline.
As part of the development, specific criteria for
explainability and real-time capability are also
defined and systematically evaluated. The generated
explanations must be comprehensible and traceable
for various user groups and provide both local and
global insights into model decisions and adaptations.
At the same time, the pipeline must not delay data
stream processing, requiring all components to be
optimized for efficiency and resource usage. Through
this integrative and iterative approach, a robust,
scalable, and trustworthy real-time data processing
system is created that addresses the demands of
modern data-driven systems.
Success of the system is defined along three core
criteria: (1) accuracy and adaptability of the model
under concept drift, (2) latency of both predictions
and explanations, and (3) user-perceived clarity and
usefulness of the explanations across different
stakeholder groups.
5 IMPLEMENTATION AND
EXPERIMENTAL
EVALUATION
To validate the proposed concept of an explainable,
self-adaptive real-time data pipeline, a prototype
system was designed, implemented, and rigorously
evaluated using realistic application scenarios that
reflect practical challenges in dynamic environments.
The implementation emphasized a modular and
extensible architecture that incorporates dedicated
components for data ingestion, real-time processing,
adaptive modeling, explainability, as well as
KMIS 2025 - 17th International Conference on Knowledge Management and Information Systems
432
continuous monitoring and control. This modular
design allows for flexible integration and dynamic
combination of various state-of-the-art AI and XAI
methods, enabling comprehensive testing and
optimization under stringent real-time conditions.
The overall system architecture is depicted in
Figure 2. Data ingestion is managed through an
Apache Kafka cluster, which ensures reliable,
scalable, and fault-tolerant distribution of incoming
high-velocity data streams. For preprocessing,
Apache Flink is employed to perform essential real-
time operations such as data cleansing, feature
extraction, and feature scaling, thereby preparing the
raw data for downstream modeling tasks without
introducing significant latency. The adaptive
modeling module incorporates incrementally learning
algorithms—primarily Online Random Forests and
Hoeffding Trees—combined with a lightweight
AutoML mechanism that continuously optimizes
hyperparameters to maintain model performance.
This setup enables the system to dynamically adapt to
evolving data distributions, with a particular focus on
effectively detecting and reacting to concept drift.
To address the critical aspect of explainability, a
dedicated module was integrated that leverages
streaming-capable variants of popular explanation
methods like SHAP and LIME. Explanations are
generated for every individual prediction in real-time
and presented through an interactive web interface,
offering users transparent insights into the model’s
decision-making process. Complementing these
components, the "Monitoring & Control" subsystem,
built on Prometheus and a rule-based engine,
continuously supervises system health, evaluates
model quality, and triggers adaptive adjustments to
modeling parameters as necessary to maintain
optimal performance.
Figure 2: System Overview with Data Flow.
Figure 2 illustrates the data and control flow between
system components: solid lines represent the
continuous data flow (e.g., from Kafka Flink
Modeling Explanation), while dashed arrows
depict feedback and control relationships—especially
from the monitoring unit back to the modeling and
explanation modules. This feedback enables demand-
driven adjustments without restarting the system,
which is crucial for meeting real-time requirements.
Two application scenarios were chosen for
evaluation: an industrial IoT scenario with simulated
machine data, and a financial scenario using modified
real-time credit card transaction data. The industrial
IoT case simulates predictive maintenance on
streaming sensor data from manufacturing machines.
This scenario reflects a typical real-time environment
where early detection of equipment faults can prevent
costly downtimes. The financial scenario uses
anonymized credit card transaction streams to detect
anomalies indicative of fraudulent activities. Both
scenarios are characterized by stringent latency
requirements and dynamic data distributions, making
them ideal testbeds for evaluating adaptive modeling
and explainability under realistic operational
conditions.
The impact of concept drift on model
performance is illustrated in Figure 3. The figure
shows the model’s accuracy over time. A significant
drop in performance is observed immediately after
the simulated drift point (marked by a red vertical
line). However, due to the system’s automatic
adaptability, accuracy quickly recovers. This pattern
demonstrates that the system not only responds to
drift events but is also capable of restoring model
quality within short timeframes.
Figure 3: Model Performance Before and After Concept
Drift.
Another focus was the evaluation of the model
decision explainability. To this end, a real-time
dashboard was developed that displays the top
contributing features for each prediction as well as a
locally approximated decision structure. Figure 4
illustrates a typical output from this dashboard: on the
left, feature importances are shown in a bar chart; on
the right, a simple decision logic is visualized using a
surrogate model to explain the specific model
decision. These explanations were continuously
generated and updated for each data point.
Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines
433
Figure 4: Example Real-Time Explanation Output.
To qualitatively assess the explanations, a user study
was conducted with twelve participants from data
science and domain expert backgrounds. Participants
rated the explanations on a five-point Likert scale in
terms of clarity, usefulness, and trust.
On average, the explanations were perceived as
helpful (M=4.3) and understandable (M=4.5).
Participants particularly appreciated the visual
representation of feature importances and the ability
to trace model behavior changes caused by concept
drift.
Overall, the study suggests that even lightweight,
incremental explanation mechanisms can
meaningfully support user understanding and trust in
adaptive AI systems. Several participants expressed
interest in being able to adjust the depth and detail of
explanations to match their level of expertise. This
indicates that configurable explanation interfaces
could improve usability and acceptance across
diverse user groups.
In summary, the developed system is capable of
adaptively responding to data changes while
providing explainable decisions—without significant
performance losses in terms of response time or
model quality. Thus, the proposed pipeline
contributes to trustworthy AI in the context of
dynamic, high-frequency data streams.
6 DISCUSSION
The results presented in the previous section (Section
5) offer well-founded insights into the behavior of
explainable, self-adaptive real-time data pipelines
under realistic conditions. The core objective of the
evaluation was to assess the extent to which XAI
methods can be meaningfully integrated into adaptive
streaming systems without significantly
compromising real-time capability or model
performance. The findings suggest that such
integration is not only technically feasible but also
functionally beneficial. Notably, combining adaptive
learning with dynamic explainability yields
substantial improvements in trust, transparency, and
system control.
Performance analysis (cf. Figure 3) shows that
the system was able to quickly stabilize its predictive
performance following a detected concept drift. On
average, the model regained acceptable accuracy
within fewer than ten data windows—an efficiency
considered suitable for real-time systems. These
results support the assumption that AutoML-
supported online models are a powerful foundation
for self-adaptive architectures in streaming
environments. Moreover, the concurrent integration
of explainability modules did not lead to a significant
increase in inference latency (≤ 60ms),
demonstrating that real-time capability can be
preserved despite the added interpretability.
A particularly noteworthy aspect is the system’s
ability to adapt not only its models but also the
associated explanations continuously in response to
changing data distributions. This capability
represents a clear advantage over traditional XAI
approaches, which are typically designed for static or
batch-oriented settings. The user study confirmed that
the real-time visualization of feature importances and
local decision structures (cf. Figure 4) significantly
enhanced model interpretability—for both technical
and non-technical users.
Despite these positive outcomes, the proposed
approach also presents certain limitations. A key
challenge lies in the temporal stability of
explanations: since the models are constantly
updated, the generated explanations may vary even
for similar input data. This temporal inconsistency
can lead to user uncertainty and highlights the need
for further research in explanation-stable model
adaptation. Another concern is the scalability of
explainability in ultra-high-frequency data streams: at
inference rates exceeding 10,000 events per second,
even incremental XAI methods require substantial
computational resources. Innovative strategies are
needed here—such as selective or approximate
explanation techniques that maintain interpretability
without overwhelming system performance.
In addition, the current evaluation does not yet
cover all relevant dimensions of system robustness
and generalizability. Specifically, the adversarial
sensitivity of on-the-fly explanations remains
unexplored: since local explanation methods like
SHAP and LIME are known to be vulnerable to input
perturbations, their reliability under adversarial
conditions should be further investigated. Moreover,
the impact of high-dimensional input data and
complex model architectures on the fidelity and
stability of explanations was not explicitly
benchmarked. Addressing these open issues requires
more comprehensive evaluations using standardized
KMIS 2025 - 17th International Conference on Knowledge Management and Information Systems
434
datasets, adversarial scenarios, and controlled
variation of model and data complexity.
Furthermore, the absence of standardized
evaluation metrics and publicly available benchmarks
for explainability in streaming settings presents a
methodological gap. Community-driven efforts
toward shared testbeds and multidimensional
performance indicators—including latency, stability,
interpretability, and adversarial robustness—would
significantly enhance comparability and
reproducibility in this emerging field.
In practical applications, the proposed approach
opens up a range of possibilities, particularly in
mission-critical domains such as predictive
maintenance, real-time financial analytics, or medical
telemetry. The ability to make adaptive decisions
transparent not only supports regulatory compliance
(e.g., in line with the EU AI Act) but also strengthens
end-user trust in AI-driven systems. At the same time,
the modular architecture allows for flexible
adaptation to various data sources, model types, and
deployment environments.
This work also raises several important questions
for future research. First, there is the question of how
generalizable the approach is to more complex model
architectures, such as deep learning in streaming
contexts. Furthermore, combining the system with
reinforcement learning strategies for policy
adaptation presents a promising extension. Lastly,
there is a clear need for standardized metrics to
evaluate explainability in dynamic, non-deterministic
settings—an open challenge that has yet to be fully
addressed by either the XAI or the stream processing
research communities.
In conclusion, this study demonstrates that
combining self-adaptivity with explainability in
streaming environments is far more than a technical
exercise. It represents a strategically significant step
toward responsible, trustworthy AI systems for real-
time applications.
7 CONCLUSIONS
This paper addresses the design, implementation, and
evaluation of an explainable, self-adaptive real-time
data pipeline based on modern AI and XAI methods.
The starting point was the observation that existing
data processing systems in streaming contexts
increasingly rely on ML for autonomous model
adaptation, but often without adequate consideration
of explainability and transparency of the decisions
made. This represents a significant limitation,
especially in sensitive application domains where
both regulatory requirements and end-user trust play
a central role.
By integrating incrementally learning models
with automated model selection (AutoML) and
dynamically adaptable XAI techniques, an
architectural approach was developed that operates
both adaptively and interpretable—while
simultaneously meeting real-time requirements.
Experimental evaluation using realistic scenarios
(IoT and financial data) demonstrated that the
proposed pipeline can efficiently detect concept drift
and adapt accordingly. At the same time, the system
provided understandable and visually prepared
explanations of model decisions without significantly
impacting system latency or predictive quality.
The contribution of this paper lies both
conceptually and methodologically. On the one hand,
an architectural framework was created that explicitly
enables the coupling of self-adaptivity and
explainability in streaming contexts. On the other
hand, existing XAI methods were examined and
adapted for their suitability in streaming
environments. Moreover, the developed system
offers a practical reference implementation realized
with common open-source technologies such as
Apache Kafka, Flink, Prometheus, and SHAP/LIME,
making it applicable for industrial use as well.
Nonetheless, essential challenges remain that
should be addressed in future research. In particular,
stable interpretability over time—i.e., consistency of
explanations amid evolving models—remains an
unresolved issue. There is also a need for
standardizing metrics to systematically evaluate
explainability in dynamic contexts. The use of deeper
neural networks combined with XAI for real-time
systems—for example, via distilled surrogate
models—represents another promising direction for
further investigation.
To build on these findings, future research must
also include comprehensive, multi-dimensional
evaluations—covering adversarial robustness,
explanation consistency, and scalability across
different data rates and dimensionalities.
Comparative studies with related architectural
approaches are necessary to further validate
effectiveness and identify best practices.
Furthermore, future work could extend the system
architecture with active learning mechanisms, self-
explaining user interfaces, or semantically grounded
model feedback systems to enable even closer
integration between users, the system, and
explanations. Finally, a long-term user acceptance
study under real-world conditions (e.g., in Industry
4.0 environments) would be highly valuable to
Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines
435
capture the actual impact of dynamic explainability
on trust and system control.
The results presented here illustrate that
explainable, self-adaptive AI systems in real-time
data contexts are not just a theoretical vision but a
practically implementable reality—provided that
methodological robustness, system scalability, and
human-centered perspectives are given equal
consideration.
REFERENCES
Abbas, T., & Eldred, A. (2025). AI-Powered Stream
Processing: Bridging Real-Time Data Pipelines with
Advanced Machine Learning Techniques.
ResearchGate Journal of AI & Cloud Analytics.
https://doi.org/10.13140/RG.2.2.26674.52167
Adadi, A., & Berrada, M. (2018). Peeking inside the black-
box: a survey on explainable artificial intelligence
(XAI). IEEE access, 6, 52138-52160. https://doi.org/
10.1109/ACCESS.2018.2870052
Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot,
A., Tabik, S., Barbado, A., ... & Herrera, F. (2020).
Explainable Artificial Intelligence (XAI): Concepts,
taxonomies, opportunities and challenges toward
responsible AI. Information fusion, 58, 82-115.
https://doi.org/10.1016/j.inffus.2019.12.012
Azeroual, O. (2024). Can generative AI transform data
quality? a critical discussion of ChatGPT’s capabilities.
Academia Engineering, 1(4). https://doi.org/10.20
935/AcadEng7407
Bifet, A., Gavalda, R., Holmes, G., & Pfahringer, B. (2023).
Machine learning for data streams: with practical
examples in MOA. MIT press. https://doi.org/10.75
51/mitpress/10654.001.0001
Cacciarelli, D., & Kulahci, M. (2024). Active learning for
data streams: a survey. Machine Learning, 113(1), 185-
239. https://doi.org/10.1007/s10994-023-06454-2
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi,
S., & Tzoumas, K. (2015). Apache flink: Stream and
batch Processing in a single engine. The Bulletin of the
Technical Committee on Data Engineering, 38(4).
http://sites. computer.org/debull/A15dec/p28.pdf
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous
science of interpretable machine learning. arXiv
preprint. https://doi.org/10.48550/arXiv.1702.08608
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., &
Bouchachia, A. (2014). A survey on concept drift
adaptation. ACM computing surveys (CSUR), 46(4), 1-
37. https://doi.org/10.1145/2523813
Gomes, H. M., Read, J., Bifet, A., Barddal, J. P., & Gama,
J. (2019). Machine learning for streaming data: state of
the art, challenges, and opportunities. ACM SIGKDD
Explorations Newsletter, 21(2), 6-22. https://doi.org/1
0.1145/3373464.3373470
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F.,
Giannotti, F., & Pedreschi, D. (2018). A survey of
methods for explaining black box models. ACM
computing surveys (CSUR), 51(5), 1-42. https://doi.
org/10.1145/3236009
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S.,
Gani, A., & Khan, S. U. (2015). The rise of “big data”
on cloud computing: Review and open research issues.
Information systems, 47, 98-115. https://doi.org/
10.1016/j.is.2014.07.006
He, X., Zhao, K., & Chu, X. (2021). AutoML: A survey of
the state-of-the-art. Knowledge-based systems, 212,
106622. https://doi.org/10.48550/arXiv.1908.00709
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab,
A. A., Yogamani, S., & Pérez, P. (2021). Deep
reinforcement learning for autonomous driving: A
survey. IEEE Transactions on intelligent
transportation systems, 23(6), 4909-4926. https://doi.
org/10.1109/TI TS.2021.3054625
Krawczyk, B. (2016). Learning from imbalanced data: open
challenges and future directions. Progress in artificial
intelligence, 5(4), 221-232. https://doi.org/10.1007/
s13748-016-0094-0
Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A
distributed messaging system for log processing. In
Proceedings of the NetDB (Vol. 11, No. 2011, pp. 1-7).
https://api.semanticscholar.org/CorpusID:18534081
Liu, C., Peng, G., Kong, Y., Li, S., & Chen, S. (2021). Data
quality affecting Big Data analytics in smart factories:
research themes, issues and methods. Symmetry, 13(8),
1440. https://doi.org/10.3390/sym13081440
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G.
(2018). Learning under concept drift: A review. IEEE
transactions on knowledge and data engineering,
31(12), 2346-2363. https://doi.org/10.48550/arXiv.
2004.05785
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to
interpreting model predictions. Advances in neural
information processing systems, 30. https://doi.org/
10.48550/arXiv.1705.07874
Mohammadi, M., Al-Fuqaha, A., Sorour, S., & Guizani, M.
(2018). Deep learning for IoT big data and streaming
analytics: A survey. IEEE Communications Surveys &
Tutorials, 20(4), 2923-2960. https://doi.org/10.1109/
COMST.2018.2844341
Molnar, C. (2020). Interpretable machine learning. Lulu.
com. https://christophm.github.io/interpretable-ml-book/
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why
should i trust you?" Explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
International Conf. on Knowledge Discovery and data
Mining (pp. 1135-1144). https://doi.org/10.1145/
2939672.293977
Rudin, C. (2019). Stop explaining black box machine
learning models for high stakes decisions and use
interpretable models instead. Nature machine
intelligence, 1(5), 206-215. https://doi.org/10.1038/s4
2256-019-0048-x
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a
modern approach. pearson. https://elibrary.pearson.de/
book/99.150005/9781292401171
KMIS 2025 - 17th International Conference on Knowledge Management and Information Systems
436
Salehie, M., & Tahvildari, L. (2009). Self-adaptive
software: Landscape and research challenges. ACM
transactions on autonomous and adaptive systems
(TAAS), 4(2), 1-42. https://doi.org/10.1145/
1516533.1516538
Sresth, V., Nagavalli, S. P., & Tiwari, S. (2023).
Optimizing Data Pipelines in Advanced Cloud
Computing: Innovative Approaches to Large-Scale
Data Processing, Analytics, and Real-Time
Optimization. International Journal of Research and
Analytical Reviews, 10, 478-496. https://doi.org/10.
54660/.IJMRGE.2022.3.1.112-120
Widmer, G., & Kubat, M. (1996). Learning in the presence
of concept drift and hidden contexts. Machine learning,
23, 69-101. https://doi.org/10.1007/BF00116900
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M.,
Dave, A., ... & Stoica, I. (2016). Apache spark: a unified
engine for big data processing. Communications of the
ACM, 59(11), 56-65. https://doi.org/10.1145/2934664
Beyond Black Boxes: Adaptive XAI for Dynamic Data Pipelines
437