XAIVIER the Savior: A Web Application for Interactive Explainable AI

in Time Series Data

Ilija

Simi

c, Christian Partl and Vedran Sabol

Know-Center GmbH, Graz, Austria

Keywords:

Explainable AI, Interactive Systems, Deep Learning, Attribution Methods, Visualization, Time Series,

Recommender.

Abstract:

The rising popularity of black-box deep learning models directly lead to an increased interest in eXplainable

AI - a ﬁeld concerned with methods that explain the behavior of machine learning models. However, different

types of stakeholders interact with XAI, all of which have different requirements and expectations of XAI

systems. Moreover, XAI methods and tools are mostly developed for image, text, and tabular data, while

explainability methods and tools for time series data - which is abundant in high-stakes domains - are in

comparison fairly neglected. In this paper, we ﬁrst contribute with a set of XAI user requirements for the

most prominent XAI stakeholders, the machine learning experts. We also contribute with a set of functional

requirements, which should be fulﬁlled by an XAI tool to address the derived user requirements. Based on

the functional requirements, we have designed and developed XAIVIER, the eXplainable AI VIsual Explorer

and Recommender, a web application for interactive XAI in time series data. XAIVIER stands out with its

explainer recommender that advises users which explanation method they should use for their dataset and

model, and which ones to avoid. We have evaluated XAIVIER and its explainer recommender in a usability

study, and demonstrate its usage and beneﬁts in a detailed user scenario.

1 INTRODUCTION

The remarkable results of deep learning (DL) mod-

els resulted in an increased interest in these models

coming from high-stakes domains, such as industry,

medicine, or ﬁnance. However, due to the severe im-

pact that a model’s predictions may have in such set-

tings, it is essential that each model prediction can be

justiﬁed as well.

The demand for explainable models contributed to

the increased attention in eXplainable AI (XAI) - a

ﬁeld that revolves around methods for explaining the

behavior of machine learning models. The number of

XAI methods is vast and in this paper we focus on XAI

methods that explain model predictions by assigning

a relevance to the input features, so-called attribu-

tion methods (Sundararajan et al., 2017). Attribution

methods are particularly relevant for tasks where the

understanding of speciﬁc predictions is of great im-

portance. Given that in this paper we focus solely

on attribution methods, we imply attribution methods

when we refer to explainers.

Many attribution methods have been recently in-

troduced, which rely on different approaches for pro-

ducing explanations. As shown in Figure 1, these dif-

ferent approaches can, and often do, lead to different

explanations for the same model and sample. Given

that the selection of a bad attribution method may

lead to no, or even wrong insights, a major obstacle

is the selection of an attribution method that faithfully

shows what aspects of the input were truly relevant

for a model to make its prediction.

Due to the increased attention to XAI, different

types of user groups (stakeholders) get in touch with

XAI. However, the different user groups also have

different background knowledge and requirements for

XAI. Thus, when developing XAI tools, the require-

ments and needs of the target user must be taken into

consideration. With this paper, we aim to support the

largest XAI user group - the machine learning experts

- with a visual tool tailored to their requirements.

Additionally, XAI methods are mostly developed

and evaluated with image, text or tabular data as high-

lighted by Guidotti et al. (Guidotti et al., 2018) and

supported by additional XAI literature reviews, which

simply omit XAI methods for time series data (Adadi

and Berrada, 2018; Hohman et al., 2018). While time

series data is abundant and important in high-stakes

166

Šimi

c, I., Partl, C. and Sabol, V.

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data.

DOI: 10.5220/0011684800003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 3: IVAPP, pages

166-178

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Figure 1: XAIVIER’s explanation comparison - Two ex-

plainers identifying different time steps of the same time

series as important (red), provided the same model predic-

tion. The explanation on top was created using DeepLIFT,

and the one on the bottom using Guided Backpropagation.

domains, it did not receive as much attention when

compared to XAI for other data types, and explain-

ability for time series classiﬁcation models consists

mostly in the application of XAI methods developed

for other data types (

Simi

c et al., 2021; Rojat et al.,

2021). Most importantly, the are few works which in-

vestigate how well existing XAI methods work with

time series data (Schlegel et al., 2019; Ismail et al.,

2020), and it has been shown that existing evaluations

are ﬂawed (

Simi

c et al., 2022).

Therefore, our visual XAI tool focuses on time se-

ries data with an included explainer recommender as,

to our best knowledge, such a tool is not available yet.

Contributions: The scientiﬁc contributions of

this paper are three-fold:

1. We provide a summary on user requirements to

support machine learning experts to understand

and improve their machine learning models. The

requirements were collected through a literature

survey on XAI requirements and were reinforced

by our own user survey.

2. We employ state of the art metrics for evaluat-

ing faithfulness (Selvaraju et al., 2017) of XAI

methods on time series data (

Simi

c et al., 2022),

to deliver a explainer recommender, which, given

a speciﬁc data set and model, suggests the most

faithful explainers.

3. Based on collected user requirements and the ex-

planation recommender, we design and evaluate a

novel tool. XAIVIER, the eXplainable AI VIsual

Explorer and Recommender, is a web application

for interactive XAI in time series data for machine

learning experts.

Due to the lack of time series XAI UIs, we de-

veloped XAIVIER primarily with this data type in

mind. XAIVIER supports exploration of time series

datasets, as well as inspection of their corresponding

models. It also allows users to explain a model’s pre-

dictions with many prominent explainers. XAIVIER

also contributes with the inclusion of an explainer

recommender, which advises users which explainer

they should use, and which ones they should avoid.

Of course, our intention for the future is to extend

XAIVIER to other relevant data types as well.

2 RELATED WORK

Depending on the scope of the explanation, XAI

methods (or simply explainers) can be divided into

global or local explainers (Guidotti et al., 2018).

While global explainers try to clarify a model’s be-

havior as a whole, local explainers provide informa-

tion about why a model made a speciﬁc prediction

for a single sample. Attribution methods (Sundarara-

jan et al., 2017) are the most popular type of local

explainers, which explain a model’s prediction by as-

signing a relevance to each individual input feature,

representing how much it contributed to the model’s

prediction.

A multitude of attribution methods have been re-

cently introduced, which utilize different approaches

to compute the relevances. For example, attribu-

tion methods can rely on gradients (Simonyan et al.,

2014), relevance backpropagation (Bach et al., 2015),

feature occlusion (Zeiler and Fergus, 2014), or surro-

gate models (Ribeiro et al., 2016).

Feature attributions are typically visualized as bar

charts (Ribeiro et al., 2016; Lundberg and Lee, 2017)

(for tabular data), heat maps (Selvaraju et al., 2017;

Sundararajan et al., 2017) (for image, text and time

series data) or line charts (Goodfellow et al., 2018)

(time series speciﬁc). In XAIVIER, we opted for

the heat map representation, as it is a more com-

pact visual representation of feature relevance than

line charts, and more commonly used as shown in re-

cent visual analytics for XAI reviews (Hohman et al.,

2018; Alicioglu and Sun, 2022).

Many libraries (Alber et al., 2019; Kokhlikyan

et al., 2020; Klaise et al., 2021; Arya et al., 2019),

SDKs (Wexler et al., 2019) (e.g, Vertex Explain-

able AI

), and command line tools (e.g., Modelstu-

dio

) offer implementations of prominent explainers.

These tools are a great starting point for advanced

users to apply different explainers on their own mod-

els. However, even though some of these tools may

generate interactive user interfaces (UIs) to simplify

analysis, all of them require programming knowledge

https://github.com/GoogleCloudPlatform/

explainable ai sdk

https://github.com/ModelOriented/modelStudio

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data

167

and strong familiarity with the speciﬁc framework the

models were developed with. This makes them not

suitable for a broader audience. Even for experienced

users, having to sift through the documentation and to

set up an appropriate environment just to simply gen-

erate a few explanations poses a great entry barrier.

Fiddler.ai

is a model monitoring application that

offers also model explainability support. However,

explanations are only supported for models trained

on tabular and natural language data. Fiddler.ai also

supports only three explainers without any indication

how well they are suited for the used dataset and

model. However, the explanations are generated not

by using the original model, but a surrogate model,

which approximates the original model. This in itself

can be a problem, since it cannot be guaranteed that

the surrogate model relies on the exact same features

to make its prediction as the original model.

Superwise.ai

, another model monitoring applica-

tion also offers model explainability. However, they

do not support local explanation methods, but offer

model explainability through various model monitor-

ing metrics. Additionally, they only support tabular

datasets and are not suited for time series data.

H2O Driverless AI

, an automated machine learn-

ing platform, offers interpretability functionalities

and supports time series data. However, time series

models can only be trained for forecasting and not for

classiﬁcation. Additionally, even though they offer a

variety of models, they do not support deep learning

models for time series data. Moreover, the feature im-

portance that is computed for individual predictions is

based on automatically extracted features of the orig-

inal time series, since the model is not trained on the

raw data in an end-to-end fashion.

Given that these tools are designed for automated

model training, or model monitoring through various

metrics, it is not surprising that they do not provide

XAI as a core functionality that targets the require-

ments of machine learning experts as XAI practition-

ers. This is also reﬂected in the fact that none of the

tools offer the option to explore explanations of mul-

tiple samples at once, or to compare explanations of

different explainers for one sample.

Moreover, they either leave the crucial task of ex-

plainer selection completely to the user, or do not of-

fer any information about the quality of the explana-

tions. This is especially problematic, given the poten-

tial disagreement between explainers on what features

are actually important. Some well-performing ex-

plainers may be also limited to speciﬁc model types,

https://www.ﬁddler.ai/

https://www.superwise.ai/

https://h2o.ai/h2o-driverless-ai

requiring the selection of an alternative explainer.

While various visual analytics solutions for XAI

have also been proposed in the literature (Spinner

et al., 2019; Li et al., 2020; Krause et al., 2017;

Collaris and van Wijk, 2020), none provide an ex-

plainer recommender that identiﬁes the most faithful

explainer, nor have they been developed for time se-

ries data.

With XAIVIER, we aimed to design and develop

a tool that is tailored to the requirements of machine

learning experts as XAI practitioners, speciﬁcally for

explaining the predictions of time series classiﬁers.

XAIVIER allows its users to rapidly explore and an-

alyze the explanations of many time series samples

at once, as well as to compare the explanations gen-

erated by a multitude of explainers. Additionally,

XAIVIER provides an explainer recommender that

pre-selects the best explainer for the provided dataset

and model and provides information why an explainer

is recommended or to be avoided.

The problem of recommending appropriate visual

representations for speciﬁc data has been thoroughly

explored (Mutlu et al., 2016; Zhu et al., 2020). In

contrast to these visualization recommendations our

proposed explainer recommender does not provide a

ranking for different visual data representations. In-

stead, it validates the correctness of different XAI al-

gorithms and ranks them depending on how faithfully

the explanations capture what was actually important

to a model to make its predictions. The visual repre-

sentation of the explanation, a heat map, is the same

for all XAI methods.

3 XAI REQUIREMENTS

ANALYSIS

We derived the user requirements for the main user

group of XAI by two means. First, we conducted

a literature review to identify XAI stakeholders and

their user requirements. Second, based on the results

from this review, we focused on the main XAI user

group and performed our own survey to conﬁrm and

complement user requirements found in the literature.

In the following, we summarize the results of the lit-

erature review and our own survey, and describe the

derived user requirements.

3.1 Literature Review

Different overlapping XAI stakeholder categoriza-

tions have been proposed. For example, Preece et

al. (Preece et al., 2018), differentiates between: i)

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

168

developers - who are concerned with building AI ap-

plications, ii) theorists - who want to understand and

advance AI theory, iii) ethicists - who are concerned

with fairness accountability and transparency of AI

systems, and iv) users - people who use AI systems.

A more commonly used XAI stakeholder catego-

rization was proposed by Arrieta et al. (Arrieta et al.,

2020), who identiﬁes the following stakeholders: i)

model users - make decisions based on model output,

ii) regulatory entities - regulate the process of using

AI systems, iii) managers - decide where to employ

AI, iv) developers - build, train and optimize models,

and v) affected users - people in the scope of an AI

decision.

Bhatt et al. (Bhatt et al., 2020) interviewed em-

ployees from companies that employ XAI to identify

the main user groups relying on explainability and

how they are using it. They concluded that in prac-

tice the most common explanation consumers were

machine learning engineers (including data scientists

and researchers), which corresponds to the developer

stakeholder according to Arrieta et al. They utilized

explanations to identify prediction relevant aspects

of the input, debug and improve faulty models, and

to verify the predictions before deployment. Over-

whelmingly, the most common type of employed XAI

method were local explainability methods such as at-

tribution methods. Moreover, they also interviewed

data scientists who are currently not using any XAI

tools to understand their expectations of XAI. The

majority of them wanted to employ XAI methods to

debug and monitor (verify) machine learning models.

Langer et al. (Langer et al., 2021) identiﬁed

XAI stakeholders’ main desiderata regarding XAI,

and used the stakeholder categorization as deﬁned

by Arrieta et al. (Arrieta et al., 2020). The main

desiderata identiﬁed by Langer et al. for the devel-

opers were veriﬁcation and performance. Veriﬁcation

entails checking if the model functions as intended,

while performance implies the ability to increase the

model’s accuracy, e.g., identifying underrepresented

training data with the help of explainable methods.

This is also supported by other important desiderata

for this stakeholder class, such as debuggability (abil-

ity to debug a model using XAI), effectiveness (ability

to produce the desired result) and efﬁciency (reduce

effort with the help of XAI).

Moreover, when originally introducing the devel-

oper stakeholder, Arrieta et al. included that XAI

methods have to be informative to developers, and

that they have to feel conﬁdent when relying on the

employed methods.

Throughout the rest of this paper, we will use the

term machine learning experts to refer to the devel-

oper stakeholder of Arrieta et al. and the machine

learning engineer stakeholder of Bhatt et al. The ma-

chine learning experts are stakeholders who are pro-

ﬁcient in training, developing and maintaining of ma-

chine learning models.

3.2 User Requirements Survey

To validate the machine learning experts’ XAI re-

quirements from the literature, we have performed a

survey as part of an XAI workshop with 20 partici-

pants of the target group from an AI research insti-

tute. In the survey, the machine learning experts were

asked to provide requirements and desiderata that they

want or expect from XAI. Most requirements were in

agreement with the previously proposed ones, espe-

cially regarding debugging and veriﬁcation. The most

important additional requirements were related to ex-

plainer faithfulness (Selvaraju et al., 2017), where the

experts were interested in reliable and repeatable ex-

planations that they can trust. This is in agreement

with the conﬁdence requirement proposed by Arrieta

et al. Moreover, the participants deemed it important

that the explanations are easily understandable and

interpretable, and that they can have a comparable

overview of the explanations. Also, the participants

deemed it important that the explanations can be com-

puted fast (related to efﬁciency), as well as applicabil-

ity of XAI methods to any model.

3.3 User Requirements

It is evident that currently XAI is mainly employed

by machine learning experts. By combining the re-

quirements identiﬁed in the literature review with the

feedback from machine learning experts in our sur-

vey, we have derived the following machine learning

expert user requirements for an XAI tool:

UR1: Model Improvement. An XAI tool has to

support the user in improving a model, i.e, by support-

ing the user in debugging a faulty model or further in-

creasing the performance of an already good model.

(Bhatt et al., 2020; Langer et al., 2021)

UR2: Model Veriﬁcation. An XAI tool has to

support the user in verifying a model’s predictions.

This may be by conﬁrming that the model relies on

input characteristics that align with human expecta-

tions, or are plausible to humans. (Bhatt et al., 2020;

Langer et al., 2021)

UR3: Effectiveness & Efﬁciency. An XAI tool

should make the user more effective and efﬁcient.

The user should be able to produce the desired result

with reduced effort. (Langer et al., 2021)

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data

169

Table 1: Mapping of functional requirements to user re-

quirements.

UR1 UR2 UR3 UR4 UR5

FR1 • • •

FR2 • • • •

FR3 • • •

FR4 • •

FR5 • •

FR6 • • •

FR7 • •

UR4: Informativeness & Understandability.

An XAI tool should provide understandable and con-

cise explanations without being overwhelming. (Ar-

rieta et al., 2020)

UR5: Faithfulness. Users should be conﬁdent

that the employed XAI methods work reliably and ex-

planations are trustworthy. (Arrieta et al., 2020)

All of the listed user requirements have also been

conﬁrmed through our own user requirements survey.

4 XAIVIER

Starting from the user requirements described in the

previous section, we translated them into concrete

functional requirements for an interactive XAI tool.

These functional requirements served as basis for the

design and development of XAIVIER. In the follow-

ing, we will ﬁrst introduce the functional require-

ments and subsequently elaborate on the design and

development process of XAIVIER. Finally, we will

provide details about the implementation.

4.1 Functional Requirements

With XAIVIER we aimed to develop an XAI tool tai-

lored to the user requirements of machine learning

experts. With this goal in mind, we deﬁned a set of

functional requirements that address the user require-

ments. Table 1 provides an overview about the map-

ping of functional requirements to user requirements.

FR1: Dataset and Model Selection. The ﬁrst

step in understanding a model trained on a dataset is

the selection of the dataset and model itself. The se-

lection and loading of the dataset and model should

be unmistakably easy. (UR1, UR2, UR3)

FR2: Dataset Inspection. Users should be

enabled to interactively explore datasets and exam-

ine groups of samples or individual samples in detail.

(UR1, UR2, UR3, UR4)

FR3: Model Quality Estimation. Users should

be able to easily judge the performance of the whole

model, as well as easily identify problematic samples.

(UR1, UR2, UR3)

FR4: Explainer Recommendation. Given that

users cannot know which explainer works best for

their dataset and model, they have to be supported in

their explainer choice. Therefore, explainers suitable

for a given model and dataset should be suggested and

pre-selected for the user. (UR3, UR5)

FR5: Recommendation Transparency. Ex-

plainer recommendations should not be a black-box

themselves. Users should be supported in getting an

understanding on why some explainers are recom-

mended while others are not. (UR4, UR5)

FR6: Model Understanding. The system

should support users in gaining an understanding of

model predictions and behavior. Explanations to indi-

vidual predictions should be presented in a clear and

easy to understand way. (UR1, UR2, UR4)

FR7: Explanation Comparison. The sys-

tem should provide facilities to compare explanations

from different explainers in order to enable users to

judge differences in explanation quality themselves.

Since well-performing explainers should be recom-

mended, a comparison of explanations can help to

build trust in the recommendations. (UR4, UR5)

4.2 Application Design

For the development of XAIVIER, we focused our at-

tention on supporting XAI for univariate time series

classiﬁcation, due to the abundance of this data type in

high-stakes domains, and omission of tools support-

ing this data type for the classiﬁcation task. Speciﬁ-

cally, due to the fact that most XAI methods were not

originally developed with time series data in mind, we

considered that a tool that can help the user in select-

ing the best XAI method for this data type would be

highly beneﬁcial. Especially, when considering that

evaluation of XAI methods for time series was not ad-

equately supported until recently (

Simi

c et al., 2022).

We started by designing the initial UI mock-ups

based on the functional requirements. Iterative im-

provements were made on the mock-ups after mul-

tiple discussions among machine learning experts in

regards to the expected interaction path that machine

learning experts would prefer in such an application.

It was clear that FR1 had to be addressed ﬁrst,

since the selection of a dataset and model precedes

all other FRs. Therefore, we decided to dedicate the

whole landing page (Figure 2) solely to the selection

of the dataset and to its corresponding model. After

loading the data, the user would be presented with a

data analysis page, which addresses all other FRs.

We decided to separate the data analysis page

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

170

(Figure 3) into two core functionalities: model inves-

tigation and explainer recommendation. Each core

functionality is accessible through a corresponding

tab in the page’s navigation bar (Figure 3 (a)).

The model investigation tab is designed to allow

users to interactively explore and analyze data sam-

ples (FR2), model predictions (FR3), and prediction

explanations (FR6). All samples of the time series

data set are shown in the result view located in the

center of the page (Figure 3 (b)). In addition to plot-

ting the data in a line chart, we show relevant meta-

data per sample, including the sample id, the sample’s

ground truth, the model prediction, and the prediction

conﬁdence. The samples may be sorted according to

any of these properties.

In order to enable users to identify relevant sam-

ples quickly, we added a ﬁltering pane (Figure 3

(d)), which contains two complementary ﬁltering in-

terfaces. A list-based interface is provided at the

top, which allows users to include or exclude sam-

ples based on their ground truth and model prediction

classes by checking the corresponding list entries, re-

spectively. A confusion matrix is also provided at the

bottom, which gives a compact overview about the

performance of the model (FR3). The matrix can also

be directly used to ﬁlter the data by selecting individ-

ual cells, rows, or columns.

To address FR6, we allow users to choose from a

range of attribution methods as explainers in the con-

ﬁguration interface shown in Figure 3 (c). To address

FR4, XAIVIER can benchmark the available explain-

ers on the selected dataset and model. In the case that

the benchmark has been already completed, the best

explainer will be pre-selected (Figure 3 c)).

Every explainer assigns a score to each time point,

which is indicated by the background color at the cor-

responding positions in the line charts of each time

series sample. Depending on the explainer, the score

either represents the relevance of a time point for the

predicted class, or, whether the time point conﬁrms

the prediction or conﬂicts with it. If an explainer pro-

duces only positive scores, it implies that the score

shows how relevant a feature was for the model’s pre-

diction. A score of 0 means the feature was irrelevant

and is indicated by a white color, while a score of 1

means the feature was relevant and is indicated with

a red color. If an explainer assigns both positive and

negative scores to the features, it implies that the fea-

ture was conﬁrming the prediction (score = 1, red), or

conﬂicting it (score = -1, blue). A score of 0 (white)

in this case would mean that the feature had no im-

portance for the prediction.

To allow for easy comparison between explana-

tions (FR7), a secondary explainer can be selected on

Figure 2: Landing page - Consists of: dataset selection

dropdown (left), and model selection dropdown (right).

demand. As shown in Figure 1, this causes a copy of

the line chart to be displayed below the original.

Given the limited display space, samples may only

be investigated on a rather coarse level of detail in the

results view. For further detail, single selected sam-

ples and their attributions can be shown in an enlarged

detail view, as illustrated in Figure 4 (FR2 and FR6).

In the detail view’s line plot, users can zoom along the

time axis and read the exact data values and catego-

rization of the explainer per time point. Below the line

plot, a list of heatmap previews provides an overview

of all available explanations for that sample and al-

lows for comparison (FR7). To quickly switch what

explanation is shown, we allow the active explainer to

be selected directly from this list.

The explainer recommendation tab is designed to

advise users, which of the supported explainers to

use for a given dataset and model, and which ones to

avoid. It additionally provides more detailed informa-

tion about the explainer evaluation and their results to

make the explainer recommendation more transparent

(FR5). As shown in Figure 5, we provide a recom-

mendation summary, which includes a recommended

explainer (highest score in evaluation), as well as rec-

ommendations for explainers to avoid (scores close to

0). This summary is followed by a grouped bar chart,

which shows the scores of all explainers. For each

explainer we display the total score, by which the ex-

plainers are ranked, as well as the individual metric

scores that are used to compute the total score. A de-

tailed description of the used metrics is provided in

Section 4.3. Right below the chart, we provide further

information in textual form about how to interpret the

results and how the score is computed.

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data

171

Figure 3: Model investigation UI - Consists of: a) navigation bar, b) results view, c) conﬁguration interface, and d) ﬁltering

pane.

Figure 4: Detail view - Sample with explanation of the ac-

tive explainer DeepLIFT (top). Preview of all explanations

(bottom).

4.3 Explainer Recommendation

As shown in Figure 1 different explainers can produce

explanations that are not in agreement, which in turn

raises the question as to which explainer can correctly

identify the most important aspects of the input. To

ﬁnd the answer, it is necessary to estimate the qual-

ity of the generated explanations, depending on how

well they identify the features that were truly relevant

for a model to make its prediction, also referred to as

faithfulness (Selvaraju et al., 2017).

One of the most prominent approaches for mea-

suring the quality of explainers is region perturba-

tion (Samek et al., 2016), which progressively per-

turbs the input features according to their estimated

relevance, either in the most relevant (MoRF) or

least relevant ﬁrst (LeRF) order. After each pertur-

bation, the model makes a prediction for the per-

Figure 5: Main components of the Explainer Recommenda-

tion UI: recommendation summary and detailed results.

turbed sample. Plotting the change in predictions

produces a perturbation curve. Perturbing truly rele-

vant features should degrade the model’s performance

quickly, causing a small Area Under the Perturbation

Curve (AUPC

MoRF

), while perturbing the least rele-

vant features should barely affect the model, causing

a large AU PC

LeRF

However,

Simi

c et al. (

Simi

c et al., 2022)

have shown that relying exclusively on either the

AUPC

MoRF

or AUPC

LeRF

is unreliable for time se-

ries data. Consequently, they introduced two metrics,

namely the Perturbation Effect Size (PES) and the De-

caying Degradation Score (DDS). Both the PES and

DDS utilize the AUPC

MoRF

and AUPC

LeRF

together

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

172

to compute how consistently an explainer can separate

relevant from irrelevant features (PES), and to what

extent (DDS). Above all,

Simi

c et al. have shown that

these two metrics work well with time series data.

XAIVIER estimates the faithfulness of an ex-

plainer by combining both PES and DDS into a sin-

gle total score, which is the sum of the PES and DDS,

on which the recommendations are based. The total

score produces a value in the range [−2, 2], where a

high score indicates that the explainer separates im-

portant features most consistently and to the greatest

extent. On the other hand, any explainer for which ei-

ther PES or DDS is close to zero should be avoided.

Moreover, caution is advised to using explainers with

a high negative score, since they systematically mis-

take relevant and irrelevant features.

4.4 Implementation Details

XAIVIER consists of a separate frontend web inter-

face and a backend service. The backend is imple-

mented in Python and serves the frontend with re-

quested information such as datasets, models, pre-

dictions, and explanations. This information is pro-

vided via REST API, which is implemented using the

FastAPI

framework and hosted by the Uvicorn

web

server. New datasets and trained models can be in-

tegrated in the backend by providing data and model

ﬁles, and some meta-data that describes which model

to use for which dataset. Currently, only PyTorch

models are supported. For explaining the model pre-

dictions, XAIVIER provides the following explain-

ers: Saliency (Simonyan et al., 2014), DeepLIFT

(Shrikumar et al., 2017), Guided Backpropagation

(Springenberg et al., 2014), InputXGradient (Shriku-

mar et al., 2016), IntegratedGradients (Sundarara-

jan et al., 2017), KernelSHAP (Lundberg and Lee,

2017), LIME (Ribeiro et al., 2016), GradCAM (Sel-

varaju et al., 2017), Feature Ablation

, GuidedGrad-

CAM (Selvaraju et al., 2017) and Deconvolution

(Zeiler and Fergus, 2014). All explainers are pro-

vided by Captum

except GradCAM for which we

use our own implementation. As previously men-

tioned, the combined scores for explainer faithfulness

estimation proposed by

Simi

c et al. (

Simi

c et al.,

2022) are used to calculate the explainer recommen-

dations. Once computed, model predictions, explana-

tions and explainer recommendations are stored in an

https://fastapi.tiangolo.com/

https://www.uvicorn.org/

https://pytorch.org/

https://captum.ai/api/feature ablation.html

https://captum.ai/

ArangoDB

database for quick retrieval.

The frontend is implemented in Angular

using

HTML, CSS and Typescript.

5 USER SCENARIO

To highlight the usefulness of XAIVIER, we will de-

scribe a user scenario that involves debugging a neural

time series classiﬁcation model using XAI. Moreover,

this user scenario will showcase the importance of all

introduced user requirements.

Problem Setting: Martin, a data scientist, re-

ceived an assignment from a semiconductor company

to detect if an error occurred during the production

of individual wafers. The company only recently

equipped a machine with a sensor to perform mea-

surements during the production. Soon, the company

created a dataset that consisted of imbalanced univari-

ate time series data, where each sample should be

classiﬁed as either normal or abnormal. Easier than

expected, Martin managed to train a deep learning

model which achieved 100% accuracy on the training

and validation data. Meanwhile, the company cre-

ated an additional dataset to validate the model, on

which the model performed very badly. Many abnor-

mal samples were not correctly classiﬁed. To ﬁnd the

issue, Martin is confronted with two common prob-

lems in model debugging with XAI: i) explainer se-

lection, and ii) problem identiﬁcation and correction.

Scenario: Using XAIVIER, Martin wants to con-

ﬁrm ﬁrst that the data on which the company tested

the model indeed performs as bad as they claim.

Therefore, Martin ﬁrst loads the company’s test

dataset and his model into XAIVIER using the land-

ing page (Figure 2) and then proceeds to the model

inspection page (Figure 3). By looking at the confu-

sion matrix (Figure 3 d) - bottom), Martin is able to

quickly conﬁrm that indeed the model mistakes many

of the abnormal samples as normal (UR2, UR3).

Clearly, the model has not learned the right charac-

teristics of abnormal samples. Therefore, Martin has

to identify the data characteristics in the training data

that the model deems important when it classiﬁes a

sample as abnormal, and verify that these characteris-

tics are sensible.

To switch to the training dataset, Martin clicks on

the home button in the top-right corner of the model

inspection page, which redirects him back to the land-

ing page. Here, he can choose the training dataset

and the problematic model. When he arrives at the

https://www.arangodb.com/

https://angular.io/

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data

173

model inspection page, he can see again in the confu-

sion matrix, that on the training data the model per-

forms perfectly. Since Martin is interested only in the

abnormal samples, he applies a ﬁlter by selecting the

top-left cell in the confusion matrix. To be able to

reason about the data characteristics that lead to the

prediction as abnormal in these samples, Martin has

to select an explainer. Luckily, the explainers have

been evaluated, which is why IntegratedGradients is

already pre-selected. Martin is informed via a pop-up,

that this is the best explainer for his data and model.

Martin is not yet sure if he should trust the ex-

plainer recommendation, since he is unfamiliar with

this method and he has already heard about Ker-

nelSHAP and its good performance on tabular data.

Therefore, Martin wants to see how IntegratedGradi-

ents compares to KernelSHAP in the explainer rec-

ommendation page, which he can access by clicking

on the link in the pop-up or on the respective tab

(Figure 3 c)). In the detailed results (Figure 5) it is

evident that IntegratedGradients performs most con-

sistently and separates most relevant features from

least relevant features to the greatest extent, while

KernelSHAP performs rather poorly. To be 100%

sure, he switches back to the model inspection tab

and compares the explanations of IntegratedGradients

and KernelSHAP visually. He enables the display of

explanations via the ”show explanation“ toggle, se-

lects KernelSHAP as secondary explainer, and en-

ables comparison. Martin can now see that Integrat-

edGradients provides clear and precise explanations

of what time points were crucial to the prediction,

while KernelSHAP provides very noisy explanations

(Figure 6). After seeing this, Martin decides to trust

the explainer recommendations and use the explainer

that is pre-selected for his further analysis (UR5).

Now Martin inspects the explanations produced

with IntegratedGradients in further detail to see what

the model is ”looking for” when identifying abnormal

samples. By looking at the time steps highlighted in

the heatmaps (UR4), he realizes that the model is al-

ways reacting on a sharp spike at the end of the time

series to detect an anomaly (UR1), as seen in Figure 6.

However, after consulting with an expert, Martin un-

derstood that anomalous patterns occur earlier in the

time series, and that these sharp spikes at the end are

actually an artifact in the data. Besides, sharp spikes

can also occur throughout normal time series.

To ﬁx the issue, Martin decides to remove the

last few time points from the training data outside of

XAIVIER and to re-train the model. The new model

also achieved an accuracy of 100% on the training

and validation data. However, this time, Martin veri-

ﬁed with XAIVIER that the explanations seem sensi-

Figure 6: Comparison of explanations for multiple ab-

normal samples generated with IntegratedGradients (pre-

cise upper explanation per sample) and KernelSHAP (noisy

lower explanation per sample).

ble to him, since the model considers various aspects

of the input and not only sharp spikes. Before send-

ing the model to the customer, Martin also tested the

new model on the company’s test data, on which it

fortunately also performed well.

6 USER STUDY

To evaluate the usability of XAIVIER and investigate

if the target group, the machine learning experts, are

using XAIVIER as it was intended, an initial thinking

aloud study was performed. During the study, the par-

ticipants were given a number of tasks, which encour-

aged them to explore all functionalities of XAIVIER.

6.1 Study Design

Two machine learning experts from an AI research

institution were hand-picked for the study. The pre-

requisites for participating in the study were that the

participants had to have experience in applying ma-

chine learning, that they previously worked with time

series data, and that they never used XAIVIER before.

Protocol: First, the participants received an expla-

nation about the study procedure and were informed

that during the study the contents of the screen and

the audio will be recorded. Following, they had to

sign a consent form, after which they received de-

tailed instructions about how to participate and be-

have in a thinking aloud study. Before starting with

the actual tasks, the participants performed a warm-

up task, where they were instructed to mentally per-

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

174

form a 2-digit multiplication and think aloud. Prior to

starting with the tasks, we enabled the screen and au-

dio recording. The participants were then given ﬁve

tasks to solve using XAIVIER. After completing all

tasks, the participants had to answer a set of prepared

questions regarding their thoughts and applicability of

XAIVIER, as well as provide any additional feedback

regarding the application.

Task Design: We designed ﬁve tasks (T1-T5) in a

way to encourage the participants to utilize all avail-

able functionalities of XAIVIER in order to get feed-

back regarding the usability of the whole application.

T1: The participants were ﬁrst instructed to load

a time series dataset and classiﬁcation model. After

that, they could freely explore the user interface, with

the goal to collect ﬁrst impressions of the UI.

T2: The participants were asked to use the recom-

mended explainer and to identify the characteristics

that the model is looking for to identify a sample as

abnormal. Here we wanted to understand if the par-

ticipants are able to navigate the UI with a concrete

goal in mind, and most importantly, if they are able to

interpret the explanations.

T3: To evaluate the comprehensibility of the Ex-

plainer Recommendation page, the participants were

asked to explain why an explainer was recommended

to them. Also, the participants had to point out which

explainers should be avoided and why.

T4: In this task, the participants had to utilize

the explainer comparison functionality to compare

the recommended explainer with ones that should be

avoided. After comparing the explainers, the partici-

pants were asked if they could trust the recommender.

T5: In the ﬁnal task, the participants were asked

to identify the exact time points which were relevant

for the prediction, which is only possible in the Detail

View. They were also asked to look at the preview of

all explanations, and were again asked if they agree

with the recommendations.

6.2 Results and Discussion

During the free exploration phase (T1) both partici-

pants - individually referred to as P1 and P2 - eas-

ily selected and loaded the dataset and model. More-

over, both participants explored all functionalities that

XAIVIER has to offer. They also immediately under-

stood the ﬁltering and sorting mechanics, as well as

that a recommended explainer is pre-selected. Also,

both participants interpreted the sample metadata cor-

rectly. However, it was also evident that while P2 was

able to immediately understand the explanations after

turning them on, P1 understood them only during T2.

It was also noteworthy that P1 almost overlooked the

existence of the sample detail view.

During T2, both participants managed to easily se-

lect the best (recommended) explainer, and to identify

the characteristic that the model is looking for to iden-

tify samples as abnormal. P1 relied on the list-based

ﬁltering to view only the abnormal samples, while P2

used the confusion matrix ﬁlter. Also, both partici-

pants relied on the detail view to solve the task, and

hovered over the time series to inspect the relevance

of the individual time points.

Both participants managed to easily identify the

best explainer in T3. They relied both on the short

summary and the detailed results. However, both par-

ticipants assumed that the explainer with the lowest

(negative) score was the worst, while it was actually

the ones closest to zero, even though they read how to

interpret the results. The participants also stated that

the metrics description should be more visible.

During T4 there were substantial differences be-

tween the participants. P1 used ﬁrst the compare ex-

plainer functionality and performed pairwise compar-

isons of the recommended explainer with the ones to

be avoided. Following that, P1 opened the detail view

and compared multiple explanations at once. P1 de-

cided to trust the recommender, mainly since the ex-

plainers that are closer ranked to each other produce

clear, more similar explanations, and because the not

recommended explainers produce explanations that

diverge from each other, appear noisier, and were

therefore found to be harder to interpret. P2 on the

other hand did not complete this task, due to a misun-

derstanding of the task’s objective.

Both participants solved T5 smoothly. They ﬁl-

tered for the abnormal samples and purposefully

opened the detail view and examined the highlighted

time points. Both participants also used other explain-

ers to see if there are any differences, and both of them

stated that the explanations provided by the explain-

ers which should be avoided are difﬁcult to interpret.

P1 also remarked that the explanation of the explainer

that was labeled as to be cautious looks somewhat

like the recommended explainers (due to inverted rel-

evances). However, P1 assumed that it is intended

that these explanations can be partly relied on, since

the explainer is labeled as to be cautious with and not

to be avoided. Moreover, after additionally observing

the bad explanations of the explainers that should be

avoided, P2 stated to be conﬁdent that the explainer

recommender works correctly.

Summary: Through the thinking aloud study we

could verify that all of the user requirements were ad-

dressed with XAIVIER. Both participants managed to

interpret and understand the heat map explanations in-

tuitively, and correctly solved the tasks without assis-

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data

175

tance (UR4). However, one of the participants needed

some time to understand the heat map visualization

correctly. Thus, we believe that additional support

for understanding the explanations (e.g., textual de-

scription, visual cues) would be beneﬁcial when heat

maps are used to indicate feature relevance. Both

participants managed to identify the ﬂaw in the data

quite rapidly (UR3), demonstrating the usefulness for

both debugging (UR1) and veriﬁcation (UR2) pur-

poses, which was also stated by both participants as

additional feedback after they completed all tasks.

The participants stated that they trusted the explainer

recommender more, given that they could verify by

comparison that the explanations of recommended

explainers were more precise. Therefore, we con-

clude that explanation comparison can help to build

trust in explanations and in explainer recommenda-

tions (UR5).

7 CONCLUSION AND FUTURE

WORK

In this paper, we introduced a set of user requirements

for the main user group of XAI, the machine learning

experts. Following, we derived a set of functional re-

quirements, which should be fulﬁlled by an XAI ap-

plication to address the user requirements. Based on

these functional requirements we designed and devel-

oped XAIVIER, a web application for interactive XAI

focused on univariate time series classiﬁers, which

stands out through its explainer recommender. Fi-

nally, we evaluated the usability of XAIVIER and its

explainer recommender with the target group.

Through the evaluation, we could conﬁrm that

all user requirements were addressed with XAIVIER.

Moreover, XAIVIER left a positive impression on the

target group users, especially regarding the clean user

interface and that all functionalities are easy to ﬁnd

and access. Additionally, we could observe that the

participants tend to trust the explainer recommender,

especially if they can also compare the explanations

of the different explainers by themselves.

In the future, we plan to improve upon the deﬁ-

ciencies of XAIVIER that have been identiﬁed through

the thinking aloud test, as well as add support for

other data types, i.e., multivariate time series and im-

ages. After that, we intend to perform extended com-

parative experiments under controlled conditions to

quantify the beneﬁts delivered by XAIVIER, such as

reduced time and effort for obtaining explanations,

the delivery of faithful explanations by the explainer

recommender independently of the dataset and model,

and its contribution to user trust.

Finally, we plan to investigate the user require-

ments for non machine learning expert users such as

domain experts, another important XAI stakeholder,

and examine how XAIVIER can be adapted to support

both, machine learning-, and domain experts. We ex-

pect that the group-speciﬁc adaptions to XAIVIER will

involve both the visual presentation of explanations as

well as the way the explanations are generated.

ACKNOWLEDGEMENTS

This work was supported by the ”DDAI” COMET

Module within the COMET – Competence Centers

for Excellent Technologies Programme, funded by

the Austrian Federal Ministry for Transport, Innova-

tion and Technology (bmvit), the Austrian Federal

Ministry for Digital and Economic Affairs (bmdw),

the Austrian Research Promotion Agency (FFG), the

province of Styria (SFG) and partners from industry

and academia. The COMET Programme is managed

by FFG.

REFERENCES

Adadi, A. and Berrada, M. (2018). Peeking inside the black-

box: A survey on explainable artiﬁcial intelligence

(xai). IEEE Access, 6:52138–52160.

Alber, M., Lapuschkin, S., Seegerer, P., H

agele, M., Sch

utt,

K. T., Montavon, G., Samek, W., M

uller, K.-R.,

ahne, S., and Kindermans, P.-J. (2019). innvestigate

neural networks! Journal of Machine Learning Re-

search, 20(93):1–8.

Alicioglu, G. and Sun, B. (2022). A survey of visual an-

alytics for explainable artiﬁcial intelligence methods.

Computers & Graphics, 102:502–520.

Arrieta, A. B., D

ıaz-Rodr

ıguez, N., Del Ser, J., Bennetot,

A., Tabik, S., Barbado, A., Garc

ıa, S., Gil-L

opez, S.,

Molina, D., Benjamins, R., et al. (2020). Explainable

artiﬁcial intelligence (xai): Concepts, taxonomies, op-

portunities and challenges toward responsible ai. In-

formation fusion, 58:82–115.

Arya, V., Bellamy, R. K. E., Chen, P.-Y., Dhurandhar, A.,

Hind, M., Hoffman, S. C., Houde, S., Liao, Q. V.,

Luss, R., Mojsilovi

c, A., Mourad, S., Pedemonte,

P., Raghavendra, R., Richards, J., Sattigeri, P., Shan-

mugam, K., Singh, M., Varshney, K. R., Wei, D., and

Zhang, Y. (2019). One explanation does not ﬁt all: A

toolkit and taxonomy of ai explainability techniques.

Bach, S., Binder, A., Montavon, G., Klauschen, F., M

uller,

K.-R., and Samek, W. (2015). On pixel-wise explana-

tions for non-linear classiﬁer decisions by layer-wise

relevance propagation. PloS one, 10(7):e0130140.

Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia,

Y., Ghosh, J., Puri, R., Moura, J. M., and Eckersley, P.

(2020). Explainable machine learning in deployment.

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

176

In Proceedings of the 2020 conference on fairness, ac-

countability, and transparency, pages 648–657.

Collaris, D. and van Wijk, J. J. (2020). Explainexplore:

Visual exploration of machine learning explanations.

In 2020 IEEE Paciﬁc Visualization Symposium (Paci-

ﬁcVis), pages 26–35. IEEE.

Goodfellow, S. D., Goodwin, A., Greer, R., Laussen, P. C.,

Mazwi, M., and Eytan, D. (2018). Towards under-

standing ecg rhythm classiﬁcation using convolutional

neural networks and attention mappings. In Machine

learning for healthcare conference, pages 83–101.

PMLR.

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Gian-

notti, F., and Pedreschi, D. (2018). A survey of meth-

ods for explaining black box models. ACM computing

surveys (CSUR), 51(5):1–42.

Hohman, F., Kahng, M., Pienta, R., and Chau, D. H. (2018).

Visual analytics in deep learning: An interrogative

survey for the next frontiers. IEEE transactions on vi-

sualization and computer graphics, 25(8):2674–2693.

Ismail, A. A., Gunady, M., Corrada Bravo, H., and Feizi, S.

(2020). Benchmarking Deep Learning Interpretability

in Time Series Predictions. In Larochelle, H., Ran-

zato, M., Hadsell, R., Balcan, M. F., and Lin, H.,

editors, Advances in Neural Information Processing

Systems, volume 33, pages 6441–6452. Curran Asso-

ciates, Inc.

Klaise, J., Looveren, A. V., Vacanti, G., and Coca, A.

(2021). Alibi explain: Algorithms for explaining ma-

chine learning models. Journal of Machine Learning

Research, 22(181):1–7.

Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsal-

lakh, B., Reynolds, J., Melnikov, A., Kliushkina, N.,

Araya, C., Yan, S., and Reblitz-Richardson, O. (2020).

Captum: A uniﬁed and generic model interpretability

library for pytorch.

Krause, J., Dasgupta, A., Swartz, J., Aphinyanaphongs, Y.,

and Bertini, E. (2017). A workﬂow for visual diag-

nostics of binary classiﬁers using instance-level expla-

nations. In 2017 IEEE Conference on Visual Analyt-

ics Science and Technology (VAST), pages 162–172.

IEEE.

Langer, M., Oster, D., Speith, T., Hermanns, H., K

astner,

L., Schmidt, E., Sesing, A., and Baum, K. (2021).

What do we want from explainable artiﬁcial intel-

ligence (xai)?–a stakeholder perspective on xai and

a conceptual model guiding interdisciplinary xai re-

search. Artiﬁcial Intelligence, 296:103473.

Li, Y., Fujiwara, T., Choi, Y. K., Kim, K. K., and Ma, K.-

L. (2020). A visual analytics system for multi-model

comparison on clinical data predictions. Visual Infor-

matics, 4(2):122–131.

Lundberg, S. M. and Lee, S.-I. (2017). A Uniﬁed Approach

to Interpreting Model Predictions. Advances in Neural

Information Processing Systems, 30:4765–4774.

Mutlu, B., Veas, E., and Trattner, C. (2016). Vizrec:

Recommending personalized visualizations. ACM

Transactions on Interactive Intelligent Systems (TiiS),

6(4):1–39.

Preece, A., Harborne, D., Braines, D., Tomsett, R., and

Chakraborty, S. (2018). Stakeholders in explainable

AI. arXiv preprint arXiv:1810.00184.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” Why

should i trust you?” Explaining the predictions of any

classiﬁer. In Proceedings of the 22nd ACM SIGKDD

international conference on knowledge discovery and

data mining, pages 1135–1144.

Rojat, T., Puget, R., Filliat, D., Del Ser, J., Gelin, R., and

ıaz-Rodr

ıguez, N. (2021). Explainable artiﬁcial in-

telligence (xai) on timeseries data: A survey. arXiv

preprint arXiv:2104.00950.

Samek, W., Binder, A., Montavon, G., Lapuschkin, S., and

uller, K.-R. (2016). Evaluating the visualization

of what a deep neural network has learned. IEEE

transactions on neural networks and learning systems,

28(11):2660–2673.

Schlegel, U., Arnout, H., El-Assady, M., Oelke, D., and

Keim, D. (2019). Towards A Rigorous Evaluation Of

XAI Methods On Time Series. 2019 IEEE/CVF In-

ternational Conference on Computer Vision Workshop

(ICCVW), pages 4197–4201.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., and Batra, D. (2017). Grad-CAM: Vi-

sual Explanations From Deep Networks via Gradient-

Based Localization. In Proceedings of the IEEE Inter-

national Conference on Computer Vision (ICCV).

Shrikumar, A., Greenside, P., and Kundaje, A. (2017).

Learning important features through propagating ac-

tivation differences. In International Conference on

Machine Learning, pages 3145–3153. PMLR.

Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje,

A. (2016). Not Just a Black Box: Learning Important

Features Through Propagating Activation Differences.

CoRR, abs/1605.0.

Simonyan, K., Vedaldi, A., and Zisserman, A. (2014).

Deep Inside Convolutional Networks: Visualising Im-

age Classiﬁcation Models and Saliency Maps. CoRR,

abs/1312.6.

Spinner, T., Schlegel, U., Sch

afer, H., and El-Assady, M.

(2019). explainer: A visual analytics framework for

interactive and explainable machine learning. IEEE

transactions on visualization and computer graphics,

26(1):1064–1074.

Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried-

miller, M. (2014). Striving for simplicity: The all con-

volutional net. arXiv preprint arXiv:1412.6806.

Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic

attribution for deep networks. In International Confer-

ence on Machine Learning, pages 3319–3328. PMLR.

Simi

c, I., Sabol, V., and Veas, E. (2022). Perturbation effect:

A metric to counter misleading validation of feature

attribution. In Proceedings of the 31st ACM Interna-

tional Conference on Information & Knowledge Man-

agement, CIKM ’22, page 1798–1807, New York, NY,

USA. Association for Computing Machinery.

Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M.,

egas, F., and Wilson, J. (2019). The what-if tool:

Interactive probing of machine learning models. IEEE

XAIVIER the Savior: A Web Application for Interactive Explainable AI in Time Series Data

177

transactions on visualization and computer graphics,

26(1):56–65.

Zeiler, M. D. and Fergus, R. (2014). Visualizing and under-

standing convolutional networks. In European confer-

ence on computer vision, pages 818–833. Springer.

Zhu, S., Sun, G., Jiang, Q., Zha, M., and Liang, R. (2020).

A survey on automatic infographics and visualization

recommendations. Visual Informatics, 4(3):24–40.

Simi

c, I., Sabol, V., and Veas, E. (2021). Xai methods for

neural time series classiﬁcation: A brief review.

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

178