Visual Analytics for Industrial Sensor Data Analysis

Tristan Langer

and Tobias Meisen

Chair of Technologies and Management of Digital Transformation, University of Wuppertal,

Rainer-Gruenter-Str. 21, 42119 Wuppertal, Germany

Keywords:

Time Series Analysis, Industrial Sensor Data, Visual Analytics, Analytic Provenance.

Abstract:

Due to the increasing digitalization of production processes, more and more sensor data is recorded for sub-

sequent analysis in various use cases (e.g. predictive maintenance). The analysis and utilization of this data

by process experts raises optimization potential throughout the production process. However, new analysis

methods are usually ﬁrst published as non-standardized Python or R libraries and are therefore not available

to process experts with limited programming and data management knowledge. It often takes years before

those methods are used in ERP, MES and other production environments and the optimization potential re-

mains idle until then. In this paper, we present a visual analytics approach to facilitate the inclusion of process

experts into analysis and utilization of industrial sensor data. Based on two real world exemplary use cases,

we deﬁne a catalog of requirements and develop a tool that provides dedicated interactive visualizations along

methods for exploration, clustering and labeling as well as classiﬁcation of sensor data. We then evaluate the

usefulness of the presented tool in a qualitative user study. The feedback given by the participants indicates

that such an approach eases access to data analysis methods but needs to be integrated into a comprehensive

data management and analysis process.

1 INTRODUCTION

With the visions of Industry 4.0 and the Smart Fac-

tory, manufacturers have begun equipping their pro-

duction facilities with more sensors. The analysis

and utilization of this data by process experts offers

great potential for optimizing decisions, product qual-

ity and maintenance cycles throughout the produc-

tion process (Jeschke et al., 2017). Process experts

play an important role, because they have the neces-

sary expertise to interpret the recorded data as well as

recognize and label conspicuous patterns in the sen-

sor curves. Thus, they are needed to transform data

into information that is required for the application

of further methods like supervised machine learning

(Holmes et al., 1994; Hu et al., 2019). However, the

integration and utilization of modern analysis meth-

ods for their production process usually poses great

challenges for manufacturing companies. The reason

for this is that sensor data analysis falls into the area

of time series analysis, whereas time series analysis

itself is a well established research area that has de-

veloped many methods over the past years e.g. (Sakoe

https://orcid.org/0000-0002-4946-0388

https://orcid.org/0000-0002-1969-559X

and Chiba, 1978; Yeh et al., 2016; Hamilton, 2020).

These methods are then ﬁrst implemented in Python

or R and made available as non-standardized libraries

to the community e.g. (Law, 2019; L

oning et al.,

2019; Tavenard et al., 2020). Hence, they are reserved

for those companies who have the necessary exper-

tise to write a program to apply these libraries to their

own data and evaluate the results. Especially the code

to produce visual representations, which help to in-

terpret the results, is complex (Batch and Elmqvist,

2018). Usually, Manufacturing companies do not pro-

cess this kind of expertise and process experts do not

have the necessary programming knowledge to imple-

ment new methods on their own. This leads us to the

research question: How can we involve process ex-

perts and their knowledge in the rapid visualization

and analysis of new sensor data?

In this paper, we present a Visual Analyt-

ics (VA) approach, a proven method that combines the

strengths of human cognitive abilities with analysis

methods (Keim et al., 2008), to build a tool for inter-

active analysis and labeling of industrial sensor data.

Using two industrial exemplary use cases, where pro-

fessional data scientists worked with domain experts

to analyze and label industrial sensor data, we in-

vestigate how the use of dedicated interactive visu-

584

Langer, T. and Meisen, T.

Visual Analytics for Industrial Sensor Data Analysis.

DOI: 10.5220/0010399705840593

In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 1, pages 584-593

ISBN: 978-989-758-509-8

alizations along state of the art methods for explo-

ration, clustering and labeling as well as classiﬁca-

tion compares to detailed data analysis by data scien-

tists. We provide the following contributions: 1. a

requirements catalog for visual analytics tools to sup-

port industrial sensor data analysis and labeling; 2.

our interactive VA system that enables process experts

to perform sensor analysis and labeling without any

programming knowledge; 3. a qualitative user study

that compares the results of the analyses performed by

data scientists in the original projects with the results

that study participants achieved using our tool.

2 RELATED WORK

In this section, we describe the state of the art of the

application of VA tools in industrial settings and give

an overview of recent advances in time series analysis

frameworks and knowledge generation by interactive

labeling.

2.1 Visual Analytics in Industrial

Settings

Visual Analytics has proven in various scenarios to be

a helpful method to involve people in industrial set-

tings in decision-making processes (Xu et al., 2016;

Wu et al., 2018). A recent survey on the use of VA in

manufacturing scenarios show the successful applica-

tion of specialized VA applications in diverse indus-

trial settings (Zhou et al., 2019). The authors high-

light the extensive need for professional domain spe-

ciﬁc knowledge and see human-in-the-loop analysis

as one of the major ongoing key challenges of VA

systems. These ﬁndings coincide with our efforts to

involve process experts directly in the analysis of sen-

sor data.

2.2 Advances in Time Series Analysis

Methods

Analysis of time series has a long history and plays

an important role in various disciplines like medicine,

meteorology and economics (Granger and Newbold,

2014). That lead to the development of many dif-

ferent methods and new advances of the past years.

In the following, we present some of the most rele-

vant research for industrial sensor data analysis. Ef-

forts are underway to ﬁnd a uniform ﬁle format for

time series to facilitate the management of time se-

ries data (Pfeiffer et al., 2012). Dynamic Time Warp-

ing (DTW) (Sakoe and Chiba, 1978) allows to mea-

sure similarity of two time series that vary in speed

by ﬁnding the best alignment between them. Soft-

DTW (Cuturi and Blondel, 2017) and FastDTW (Sal-

vador and Chan, 2007) are two notable enhancements

to the original approach. Yeh et al. (Yeh et al., 2016)

developed the Distance Proﬁle and the Matrix Pro-

ﬁle for time series. The Distance Proﬁle is a vector of

pairwise distances of a single window to all other win-

dows of a speciﬁc length in a time series. The Matrix

Proﬁle is a vector that stores distances between sub-

sequences of a speciﬁed window length and its near-

est neighbor. Both allow quick human evaluation of

time series characteristics (e.g. outliers and trends).

There are further advances in the uses of clustering to

ﬁnd similar time series and classiﬁcation algorithms

(Ali et al., 2019) as well as forecasting for time se-

ries (Hamilton, 2020). The variety of different meth-

ods lead to the development of machine learning tool-

boxes that offer uniﬁed application programming in-

terfaces (APIs) for the use of those methods. Two

notable comprehensive toolboxes are sktime (L

oning

et al., 2019) and tslearn (Tavenard et al., 2020).

Furthermore, there are many research projects on

the application of deep learning methods to time se-

ries data e.g. (Ronao and Cho, 2016; Psuj, 2018).

However, at the moment these methods usually have

a long computing time and require a large number of

labeled data. Therefore, in their current state they are

not suitable for the initial interactive analysis by a pro-

cess expert, but are predestined as subsequent appli-

cations that make use of analyzed and labeled data

created in our tool.

2.3 Interactive Labeling

Accurate machine learning models, which are used

in the industry to support decision-making processes

and optimize production environments, require a

comprehensive set of labeled data. However, since

these are difﬁcult to obtain, Sacha et al. (Sacha et al.,

2014) suggest involving the user in this process of

knowledge generation. One of the great strengths

of VA tools is to support human experts in ﬁnding

anomalies in data. In order to mark ﬁndings and

make them usable for further analysis methods, sys-

tem designs and labeling strategies were researched

that allow interactive labeling of data during analysis

(Bernard et al., 2018). Eirich et al. (Eirich et al., 2020)

show the usefulness of interactive labeling for classi-

ﬁcation problems by presenting VIMA - a specialized

VA tool that supports technicians from the automo-

tive sector in the analysis and labeling of produced

parts. The labels are then used to train a quality pre-

dictor for further production. The workﬂow of our

tool uses these insights and also implements a form

Visual Analytics for Industrial Sensor Data Analysis

585

of interactive labeling and estimation for the quality

of a classiﬁer trained on the existing labels.

3 VISUAL ANALYTICS FOR

INDUSTRIAL SENSOR DATA

In this section, we describe two exemplary use cases

for the development of our system. Both use cases

were actual real world industrial data analysis use

cases that were originally performed by professional

data scientists in cooperation with domain experts.

We use these use cases to extract requirements for

our interactive VA tool and use the original labels and

classiﬁers as benchmark during our user study (Sec-

tion 4).

3.1 Deep Drawing Use Case

Figure 1: Exemplary data of the deep drawing use case.

Top: good stroke of the deep drawing tool - a smooth pro-

gression of strain gauge sensor data. Bottom: bad stroke

of the tool - sudden drop of the values of the gauge sensor

caused by a crack.

Deep drawing is a sheet metal forming process in

which a sheet metal blank is radially drawn into a

forming die by the mechanical action of a punch (DIN

8584-3, 2003). Our data comes from a deep drawing

tool that is equipped with a strain gauge sensor at the

blank holder of the tool. The sensor contains a strain

sensitive pattern that measures the mechanical force

exerted by the punch to the bottom part of the tool.

The deformation of the metal during a stroke some-

times causes cracks to appear, which cause a brief loss

of pressure to the bottom of the tool and thus a sudden

drop in the values of the strain gauge sensor. Figure

1 shows two example sensor value curves: a curve

of a good stroke at the top where the progression of

the sensor values is smooth and no crack occur and a

curve of a bad stroke at the bottom where there ap-

pears a sudden drop in sensor values. Data scientists

analyzed the sensor data and identiﬁed three different

classes of curves: clean, small crack and large crack.

They trained a classiﬁer on the labeled data set and

achieved an accuracy of 94.11% (Meyes et al., 2019).

3.2 Conveyor Switch Use Case

Figure 2: Exemplary data of the conveyor switch use case.

Black: normal power consumption with slight noise at the

beginning and end of the switch’s setting cycle. Red: nor-

mal power consumption in the beginning but high power

consumption in the end which indicates that one of the lock-

ing mechanisms of the switch is jammed.

The second use case considers the course of the power

consumption of several electric motors which adjust

switches in a production line to directs components

to their predetermined workstation. A setting cycle

of a switch operates in three steps: ﬁrst the switch is

unlocked, then the motor moves the switch, and ﬁ-

nally the switch is locked in the new position. Since

the base power consumption of the motors is differ-

ent for each individual motor, delta values to the ref-

erence curve of the respective motor are considered

here. The delta power consumption curves can be

used to determine the pressure that the respective mo-

tor exerts on the mechanical switch. Thereby, slug-

gish or jammed switches can be detected and main-

tenance cycles can be optimized. Figure 2 shows two

example curves: a black curve that shows usual power

consumption with slight noise in the beginning and

end of the switch’s setting cycle and a red curve that

shows normal power consumption at the start but high

power consumption deviation at the end of the setting

cycle which indicates that one of the locking mecha-

nisms of the switch is jammed and needs to be main-

tained. Data scientists analyzed the sensor data and

identiﬁed ﬁve different classes of curves: okay, slug-

gishness during unlocking, sluggishness during lock-

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

586

Figure 3: Initial raw data view with 1 navigation of all methods, 2 history, 3 raw view of sensor data, 4 x-axis zoom

and 5 legend.

ing, sluggishness during locking and unlocking and

jammed switch. They trained a classiﬁer on the la-

beled data set and achieved an accuracy of 84.86%.

3.3 Requirements

We collected the requirements for our tool from inter-

views with experts from both use cases presented in

the previous sections. During the development pro-

cess, feedback from experts was repeatedly sought as

purposed by the user-centered design principle (Abras

et al., 2004). The following is a list of the most im-

portant requirements for our system prioritized by the

experts:

1. General. The system shall be capable of visu-

alizing a sensor data set containing an arbitrary

number of curves.

2. Exploration. The system shall provide methods

to explore the data set to ﬁnd interesting patterns

and trends. The system needs to provide means to

ﬁlter data ranges and select subsets of the whole

data set to analyze speciﬁc parts of the data set.

3. Labeling. The system shall offer methods to ﬁnd

similar series e.g. that have a speciﬁc discovered

pattern and allow its user to label those efﬁciently

i.e. not force the user to label all series individu-

ally.

4. Classiﬁcation. At each step of the labeling pro-

cess the system shall provide functions to run

a classiﬁcation algorithm and review its perfor-

mance.

5. Navigation. The system must enable its users to

compare and revisit previously visited views i.e.

method, parametrization and ﬁltering.

3.4 Visual Analytics Tool

In this section we discuss the visualizations and meth-

ods provided by our VA tool

. Our implementation

mainly uses the echarts

library in the frontend for

the presentation of the charts, partly extended by own

components, and in the backend the Python libraries

STUMPY (Law, 2019), sktime (L

oning et al., 2019),

scikit-learn (Pedregosa et al., 2011) and tslearn (Tave-

nard et al., 2020).

3.4.1 Exploration

Our tool initially only supports uploading data in ts

format, which is used by the sktime library. This for-

mat allows to handle any number of already labeled

time series as speciﬁed by our requirements. After

uploading, the user ﬁrst lands on the view shown in

Figure 3. This view contains a sidebar to navigate be-

tween methods, the history described later in Section

3.4.5 and a raw view of all sensor curves in the up-

loaded data set. To explore parts of the data, the value

range of the x-axis can be restricted and individual

sensor curves can be ﬁltered via the legend. The nav-

igation menu and the history can be toggled or min-

imized and are hidden for better visibility of method

Latest version available at: http://sensor-analysis.com/

https://echarts.apache.org/

Visual Analytics for Industrial Sensor Data Analysis

587

Figure 4: Combined Matrix Proﬁle (top) and Distance Proﬁle (bottom) view with 1 concatenated sensor signal, 2 Matrix

Proﬁle, 3 window selection tools, 4 selected window, 5 Distance Proﬁle 6 connected axis zooms and 7 method

parameters.

views in the following ﬁgures.

Furthermore, a visualization for the Matrix Pro-

ﬁle and Distance Proﬁle has been implemented for

further exploration of the data. Both methods have

proven to be effective for the discovery of anoma-

lies and recurring patterns in time series and require

in their initial implementation only one parameter to

specify the used window length. In its initial form,

the view shows the original signal by concatenating

all sensor data into a single time series. Below the sig-

nal the calculated matrix proﬁle is plotted. The x-axis

can again be restricted to arbitrary ranges. Both charts

are connected with each other, so that the restriction is

always applied synchronously to both charts and the

x-axis values always align. To display the Distance

Proﬁle of a speciﬁc window we have integrated the

possibility to select a window either with the mouse

or by entering an interval. The chart below the signal

will then change to the Distance Proﬁle of the selected

window. Figure 4 shows examples of Matrix and Dis-

tance Proﬁle for the deep drawing use case. Notice

how a human can visually identify sections (e.g. x:

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

588

Figure 5: Clustering view example for DBSCAN with 1 sensor data colored by cluster, 2 x-axis zoom, 3 method param-

eters and 4 custom legend to select and label single series and clusters.

21k - 22k) where abnormal strain gauge sensor pat-

terns occur.

3.4.2 Clustering and Labeling

For our third requirement of a method to identify and

label similar sensor curves, we have designed a view

for clustering and labeling sensor data. As cluster-

ing methods we have implemented DBSCAN as rep-

resentative for density-based clustering and k-means

as representative for distance-based clustering. The

respective views differ only in the parameterization

of the methods (DBSCAN: eps and minSamples; k-

means: k). In addition, for both methods the used

distance metric can be selected from a list (euclidean,

DTW, soft-DTW, FastDTW). Furthermore, during the

development of this view we have found that it is

sometimes advantageous to limit clustering to a cer-

tain area of the sensor data set. For example, by set-

ting the window parameters you can limit the cluster-

ing in the deep drawing use case to the area where

cracks normally occur (i.e. [350, 415]) which im-

proves the resulting clusters. To enable users to label

the data efﬁciently, we have developed a custom leg-

end. The legend shows the sensor curves aggregated

by clusters. All as well as individual clusters can be

toggled completely. The colors in the legend corre-

spond to the colors in the sensor data display. Next to

the clusters and the individual sensor curve we have

added a label icon. Clicking on this icon opens a small

dialog where the user can assign a label. Labels that

have already been assigned to other curves are sug-

gested in an auto-completion. If the label is assigned

to a cluster, all sensor curves within the cluster will

be labeled, otherwise only the selected curve will be

labeled. The assigned label is then displayed in text

form in the legend. In addition, the name of the cluster

changes based on the majority assigned labels within

the cluster. With this design, we intend to enable a

user to quickly label large parts of the data by assign-

ing labels to clusters and then to reﬁne labels of indi-

vidual sensor curves if clustering assignment and se-

ries class do not match. Figure 5 shows an example

clustering and labeling view for the deep drawing use

case.

3.4.3 Classiﬁcation

To see how well a classiﬁer works on the already la-

beled data according to requirement four, we have im-

plemented a random forest adaptation for time series

(Deng et al., 2013) and k-nearest neighbors (KNN)

classiﬁcation methods. As input data the respective

algorithm takes all already labeled data. These are di-

vided into training and test data set by setting the test

size. After the classiﬁer has been trained and predic-

tions for the test data set were created, the user can

evaluate the result in the view. The view shows the

sensor data of the test data, a legend with true and

predicted labels to ﬁlter the data and a confusion ma-

trix. This allows the user to evaluate which classes are

difﬁcult to distinguish from each other and to decide

if further labels for these classes are needed. Further-

more the current accuracy score is displayed above

Visual Analytics for Industrial Sensor Data Analysis

589

Figure 6: Classiﬁcation view example for k-nearest neighbors (KNN) with 1 sensor test data set, 2 legend with true and

predicted labels, 3 method parameters and 4

confusion matrix with accuracy score.

the matrix. Figure 6 shows an example classiﬁcation

view for the deep drawing use case.

3.4.4 Forecasting

Besides classiﬁcation, forecasting is a frequently used

method in the ﬁeld of time series analysis. Although

this is not indicated by the requirements, we have im-

plemented a view for forecasting sensor values be-

cause we wanted to test if this method was still useful

for our two classiﬁcation use cases during our user

study (Section 4). In this view, the sensor curves are

concatenated similar to the Matrix Proﬁle view and an

area is selected for forecasting. The forecasted values

are then plotted against the actual values.

3.4.5 History

To complete the last requirement of the navigation

options, we have implemented an interaction history

so that users can easily retrace their previous steps

and, if necessary, jump back to views they have al-

ready visited. For this purpose, our tool records all

interactions that have been performed, which are di-

vided into the categories navigation (e.g. navigate to

a method view), layout (e.g. ﬁlter via legend or ap-

ply x-axis zoom) and parameterization (e.g. change

parameters of method). The underlying data model

also contains information about the time of execution

in the changed parameters, so that the time between

interactions can be tracked. The history also stores

the settings of the resulting views (method, layout and

parameters) and a preview image. Figure 7 shows an

Figure 7: Analysis history with 1 history overview and

navigation, and 2 parameters and preview of the currently

hovered history item.

example of a recorded history. Hovering the list of

recorded interactions will show parameters and a pre-

view of the resulting view, whereas clicking an item

will navigate the user to the resulting view.

4 EVALUATION

To evaluate the usefulness of our tool in comparison

to detailed data analyses performed by data scien-

tists, we performed a qualitative user study to explore

strengths and weaknesses of the current implementa-

tion. In this section we describe the study design and

discuss the results.

4.1 Study Design

For this study we used a subset of 60 data points

per use case described in Section 3.1 to reduce the

time and complexity for the study. The deep draw-

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

590

ing data set contained an equal amount of data for all

classes (20 clean / 20 small cracks / 20 large cracks).

The conveyor switch data set was reduced to contain

only data of a single switch to reduce the complex-

ity of the use case and, due to the number of data

contained, an unequal distribution between classes (7

okay / 7 sluggishness during unlocking / 6 sluggish-

ness during locking / 31 sluggishness during locking

and unlocking / 10 jammed switch). Both data sets

were made available to the participants without la-

bels. The labels, which were originally created by

professional data scientists, were later used as refer-

ence values to assess the participant’s performance

with the tool. For our study, we recruited 6 partic-

ipants with technical background. Before the study,

we asked them which tools they were regularly using

for data analysis and if they were familiar with the

distance metrics (euclidean, DTW) and methods (Ma-

trix/Distance Proﬁle, clustering, classiﬁcation, fore-

casting) that our tool provides. All participants were

familiar with distance metrics and methods with ex-

ception of Matrix/Distance Proﬁle that only three out

of six participants knew and all of them would typi-

cally use pure Python for sensor data analysis. Then,

participants worked both use cases, starting with the

deep drawing use case. During the study we used the

think aloud method (Van Someren et al., 1994) and

tracked interactions performed by the participants us-

ing the history described in Section 3.4.5. For each

use case we ﬁrst described the use case and asked the

participants to work the following tasks using our VA

tool while thinking aloud:

1. Describe the characteristics of the data. How

many classes do you think exist in the data set?

2. Label the data according to the classes.

3. What is the expected performance of a classiﬁer

trained on the labeled data?

After each session, we asked the participants to give

us detailed feedback on their experience with the tool

along the following questions:

1. How do you estimate your own performance with

the tool?

2. If you have experience with Python libraries for

data analysis: Would you prefer Python libraries

over the tool? For which of the tasks?

3. Did you miss any functionality?

4. Which features did you ﬁnd particularly useful?

4.2 Results

Figure 8 shows a summary of the results of our study.

Below each use case, the number of the respective

Figure 8: Summary of study results.

participant and the number of classes identiﬁed by

him or her for the use case can be seen. For the

deep drawing use case all participants were able to

produce the original three classes. For the conveyor

switch use case the participants identiﬁed between 4

and 5 classes. To evaluate the assigned labels, we

have mapped semantically matching labels to each

other. Accordingly, the chart in the ﬁgure shows the

match of the assigned class labels with the labels orig-

inally assigned by the data scientists. The deep draw-

ing use case shows small deviations from 1 to 3 curves

that were labeled differently. The conveyor switch use

case shows 5 to 15 differences. Furthermore, the val-

ues of the achieved accuracy of the tested classiﬁers

can be seen. The participants each used three quarters

of the data to train the classiﬁer and one quarter of the

existing data as a test data set. For the deep drawing

use case the participants achieved an average accu-

racy of 88.88% (orig. 94.11%) and for the conveyor

switch use case 84.47% (orig. 84.86%). This also cor-

responds with the self-assessment of the participants,

who were more convinced of their performance in the

deep drawing use case than in the conveyor switch use

case.

To solve the tasks, participants mostly used the

raw view for exploration, DBSCAN for clustering and

labeling and random forest classiﬁer to test classiﬁca-

tion. Matrix Proﬁle and Distance Proﬁle were only

used by half of the participants and the forecasting

methods were not used at all. Through the history of

our tools we were also able to track frequent inter-

action patterns of the participants. A typical interac-

tion pattern is that the participants zoom into individ-

ual sections of the data during exploration and look at

those section for a longer time. For clustering and la-

beling, they ﬁrst select the method and then adjust the

layout and parameters in quick iterations. The classi-

ﬁer was always tested only at the very end.

Overall the participants liked our tool. When they

were asked which tasks they would prefer to perform

Visual Analytics for Industrial Sensor Data Analysis

591

with our tool instead of Python, they mentioned data

exploration and clustering and labeling. No function-

ality was missed for the tasks either. There were

many suggestions to add to the existing functional-

ity. The participants especially wanted the integration

of the assigned labels into the views for exploration,

especially Matrix Proﬁles, in order to use them for

the reﬁnement of the labels. In addition, participants

wanted to be able to mark individual sensor curves in

the clustering view to make it easier to track which

cluster they are assigned to when the parameters are

changed. The participants especially liked the possi-

bility to retrace their own steps by using the history.

Nevertheless, none of the participants could imagine

to do without Python. The biggest concern was that

our tool does not integrate well into the workﬂow with

other systems and currently only supports the use of

data in ts format, for example.

4.3 Discussion

Our study indicates that our tool is well suited for

interactive exploration, labeling and classiﬁcation of

industrial sensor data. From this we deduce that pro-

viding dedicated interactive views can be a useful ad-

dition to new analysis algorithms and provides bet-

ter accessibility in industrial environments. Overall,

our participants were able to achieve good results that

were only slightly less accurate than the results of the

original analyses of the data scientists. The results

for the conveyor switch use case differ more from the

original results. We attribute this to the higher com-

plexity of the use case which also shows in the dif-

ferent number of classes identiﬁed. The fact that the

corresponding classiﬁers performed well despite the

supposedly less accurate labels is probably also due

to the fact that the original labels, that we used as ref-

erence, were not 100% accurate. This is also indi-

cated by the lower accuracy of the reference classiﬁer

trained by a data scientist.

On the basis of the analysis of the interactions per-

formed during the study, we also found that the fast

iteration of parameterizations shows the strengths of

interactive visual analytics tools, since here the qual-

ity of the results can be quickly interpreted visually

by humans. This is particularly evident in clustering

methods, where a visual interpretation of the clus-

ter results and the subsequent labeling were consid-

ered very useful by the participants. We expected the

classiﬁers to be tested more often during the label-

ing phase, but it seems that the participants worked

through the tasks one by one. Matrix and Distance

Proﬁle were used less than we expected. We attribute

this mainly to the fact that not all study participants

were familiar with the methods. The forecasting view

was not used at all but we also have not included a

corresponding task in our study design. Therefore its

usefulness remains an open research question. The

extensive feedback shows that the participants are in-

terested in the further development of the tool, but

also that the development is not yet ﬁnished and that

there are many possible improvements. We hypothe-

size that our tool is also generally applicable to other

domains with time series data. However, we have not

tested this. Though, we ﬁnd that it is more difﬁcult

to integrate such a tool well into the whole process

of data analysis (i.e. from integration to making the

data usable e.g. through deep learning), as it means a

change between different applications and workﬂows

for the user.

5 CONCLUSION AND OUTLOOK

In this paper we presented a Visual Analytics ap-

proach for interactive analysis of industrial sensor

data by process experts. We ﬁrst recorded require-

ments for our tool using two real world examples, de-

signed and implemented the tool itself, and evaluated

the current version using a qualitative user study. The

results of the study indicate that our tool is well suited

for the interactive exploration, analysis, labeling and

estimation of a classiﬁer. The strengths of the interac-

tive analysis showed up in the study especially in the

fast parameterization of methods and visualization of

the corresponding results. The participants saw the

biggest challenge for the tool in integrating well with

other systems.

Therefore, we plan to extend our tool to cover a

complete data analysis process from data integration

and analysis up to utilization of applied labels by deep

learning methods. To do so, we will research solu-

tions for the accessible integration of sensor data and

for the direct connection of deep learning processes

through e.g. deep learning toolboxes with our tool. In

addition, we received much valuable feedback from

the study participants for the further development of

the already existing functionality. We also see great

potential in the utilization of the data recorded by the

interaction history, which process experts record dur-

ing their analysis. This interaction data provides us

with information about the experts’ sensemaking pro-

cess and thus also indirectly about their understanding

of the analyzed data.

ICEIS 2021 - 23rd International Conference on Enterprise Information Systems

592

REFERENCES

Abras, C., Maloney-Krichmar, D., Preece, J., et al. (2004).

User-Centered Design. Bainbridge, W. Encyclope-

dia of Human-Computer Interaction. Thousand Oaks:

Sage Publications, pages 445–456.

Ali, M., Alqahtani, A., Jones, M. W., and Xie, X. (2019).

Clustering and Classiﬁcation for Time Series Data in

Visual Analytics: A Survey. IEEE Access, pages

181314–181338.

Batch, A. and Elmqvist, N. (2018). The Interactive Vi-

sualization Gap in Initial Exploratory Data Analy-

sis. IEEE Transactions on Visualization and Com-

puter Graphics, pages 278–287.

Bernard, J., Zeppelzauer, M., Sedlmair, M., and Aigner, W.

(2018). Vial: a uniﬁed process for visual interactive

labeling. The Visual Computer, pages 1189–1207.

Cuturi, M. and Blondel, M. (2017). Soft-DTW: a differen-

tiable loss function for time-series. In 34th Interna-

tional Conference on Machine Learning, pages 894–

903.

Deng, H., Runger, G., Tuv, E., and Vladimir, M. (2013). A

time series forest for classiﬁcation and feature extrac-

tion. Information Sciences, pages 142 – 153.

DIN 8584-3 (2003). Manufacturing processes forming un-

der combination of tensile and compressive conditions

- Part 3: Deep drawing; Classiﬁcation, subdivision,

terms and deﬁnitions. Beuth Verlag, Berlin.

Eirich, J., J

ackle, D., Schreck, T., Bonart, J., Posegga,

O., and Fischbach, K. (2020). VIMA: Modeling and

Visualization of High Dimensional Machine Sensor

Data Leveraging Multiple Sources of Domain Knowl-

edge. IEEE Transactions on Visualization and Com-

puter Graphics.

Granger, C. W. J. and Newbold, P. (2014). Forecasting eco-

nomic time series. Academic Press.

Hamilton, J. D. (2020). Time Series Analysis. Princeton

university press.

Holmes, G., Donkin, A., and Witten, I. H. (1994). Weka:

A machine learning workbench. In IEEE Australian

New Zealnd Intelligent Information Systems Confer-

ence, pages 357–361.

Hu, R., Granderson, J., Auslander, D., and Agogino, A.

(2019). Design of machine learning models with do-

main experts for automated sensor selection for en-

ergy fault detection. Applied Energy, pages 117–128.

Jeschke, S., Brecher, C., Meisen, T.,

Ozdemir, D., and Es-

chert, T. (2017). Industrial internet of things and cy-

ber manufacturing systems. In Industrial internet of

things, pages 3–19. Springer.

Keim, D., Andrienko, G., Fekete, J.-D., G

org, C., Kohlham-

mer, J., and Melanc¸on, G. (2008). Visual analytics:

Deﬁnition, process, and challenges. In Information

visualization, pages 154–175. Springer.

Law, S. M. (2019). STUMPY: A Powerful and Scalable

Python Library for Time Series Data Mining. The

Journal of Open Source Software, page 1504.

oning, M., Bagnall, A., Ganesh, S., Kazakov, V., Lines, J.,

and Kir

aly, F. J. (2019). sktime: A Uniﬁed Interface

for Machine Learning with Time Series. In Workshop

on Systems for ML at NeurIPS 2019.

Meyes, R., Donauer, J., Schmeing, A., and Meisen, T.

(2019). A Recurrent Neural Network Architecture for

Failure Prediction in Deep Drawing Sensory Time Se-

ries Data. Procedia Manufacturing, pages 789 – 797.

47th North American Manufacturing Research Conf.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,

Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:

Machine learning in Python. the Journal of machine

Learning research, pages 2825–2830.

Pfeiffer, A., Bausch-Gall, I., and Otter, M. (2012). Proposal

for a Standard Time Series File Format in HDF5. 9th

International MODELICA Conf., pages 495–505.

Psuj, G. (2018). Multi-sensor data integration using deep

learning for characterization of defects in steel ele-

ments. Sensors, page 292.

Ronao, C. A. and Cho, S.-B. (2016). Human activity recog-

nition with smartphone sensors using deep learning

neural networks. Expert systems with applications,

pages 235–244.

Sacha, D., Stoffel, A., Stoffel, F., Kwon, B. C., Ellis, G., and

Keim, D. A. (2014). Knowledge Generation Model

for Visual Analytics. IEEE Transactions on Visual-

ization and Computer Graphics, pages 1604–1613.

Sakoe, H. and Chiba, S. (1978). Dynamic programming

algorithm optimization for spoken word recognition.

IEEE transactions on acoustics, speech, and signal

processing, pages 43–49.

Salvador, S. and Chan, P. (2007). FastDTW: Toward Ac-

curate Dynamic Time Warping in Linear Time and

Space. Intelligent Data Analysis, pages 561–580.

Tavenard, R., Faouzi, J., Vandewiele, G., Divo, F., Androz,

G., Holtz, C., Payne, M., Yurchak, R., Rußwurm, M.,

Kolar, K., and Woods, E. (2020). Tslearn, A Machine

Learning Toolkit for Time Series Data. Journal of Ma-

chine Learning Research, pages 1–6.

Van Someren, M., Barnard, Y., and Sandberg, J. (1994). The

think aloud method: a practical approach to modelling

cognitive. London: AcademicPress.

Wu, W., Zheng, Y., Chen, K., Wang, X., and Cao, N. (2018).

A visual analytics approach for equipment condition

monitoring in smart factories of process industry. In

2018 IEEE Paciﬁc Visualization Symposium, pages

140–149.

Xu, P., Mei, H., Ren, L., and Chen, W. (2016). ViDX:

Visual diagnostics of assembly line performance in

smart factories. IEEE transactions on visualization

and computer graphics, pages 291–300.

Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y.,

Dau, H. A., Silva, D. F., Mueen, A., and Keogh, E.

(2016). Matrix proﬁle I: all pairs similarity joins for

time series: a unifying view that includes motifs, dis-

cords and shapelets. In IEEE 16th international con-

ference on data mining, pages 1317–1322.

Zhou, F., Lin, X., Liu, C., Zhao, Y., Xu, P., Ren, L., Xue,

T., and Ren, L. (2019). A survey of visualization for

smart manufacturing. Journal of Visualization, pages

419–435.

Visual Analytics for Industrial Sensor Data Analysis

593