Making Sense: Experiences with Multi-Sensor Fusion in Industrial

Assistance Systems

Benedikt Gollan

, Michael Haslgruebler

, Alois Ferscha

and Josef Heftberger

Pervasive Computing Applications, Research Studios Austria FG mbH, Thurngasse 8/20, 1090 Vienna, Austria

Institute of Pervasive Computing, Johannes Kepler University, Altenberger Strasse 69, Linz, Austria

Fischer Sports GmbH, Fischerstrasse 8, Ried am Innkreis, Austria

Keywords:

Sensor Evaluation, Industrial Application, Multi-Sensor Fusion, Stational Sensors, Wearable Sensors,

Challenges.

Abstract:

This workshop paper discusses the application of various sensors in an industrial assembly scenario, in which

multiple sensors are deployed to enable the detailed monitoring of worker activity, task progress and also

cognitive and mental states. The described and evaluated sensors include stationary (RGBD cameras, stereo

vision depth sensors) and wearable devices (IMUs, GSR, ECG, mobile eye tracker). Furthermore, this pa-

per discusses the associated challenges mainly related to multi-sensor fusion, real-time data processing and

semantic interpretation of data.

1 INTRODUCTION

The increasing digitalization in industrial production

processes goes hand in hand with the increased ap-

plication of all kinds of sensors, whereby the ma-

jority of these sensors are exploited for automated

machine-to-machine communication only. However,

in all human-in-the-loop processes which involve ma-

nual or semi-manual labor, physiological sensors are

on the rise, assessing the behavioral and somatic sta-

tes of the human workers as to deduce on activity or

task analysis as well as the estimation of human cog-

nitive states.

The observable revival of human labor as an op-

posing trend to the predominant tendency of full auto-

mation (Behrmann and Rauwald, 2016) is associated

with the requirements of industrial processes to be-

come more and more adaptive to dynamically chan-

ging product requirements. The combination of the

strengths of both men and machine working together

yields the best possible outcome for industrial pro-

duction, as humans provide their creativity, adaptabi-

lity, and machines ensuring process constraints such

as quality or security.

In the light of these changes towards men-machine

collaboration, it is essential for machines or com-

puters to have a fundamental understanding of their

users - their ongoing activities, intentions, and atten-

tion distributions. The creation of such a high level of

awareness requires not only (i) the selection of suit-

able sensors but as well needs to solve fundamental

problems regarding (ii) handling the big amounts of

data, (iii) the correct fusion of different sensor types

as well as (iv) the adequate interpretation of complex

psycho-physiological states.

This work will introduce the industrial applica-

tion scenario of an aware assistance system for a

semi-manual assembly task, introduce and evaluate

the employed sensors and discuss the derived chal-

lenges from the associated multi-sensor fusion task.

1.1 Related Work

With the ever-increasing number of sensors, the fu-

sion of the data from multiple, potentially heterogene-

ous sources is becoming a non-trivial task that directly

impacts application performance. When addressing

physiological data, such sensor collections are often

referred to as Body Sensor Networks (BSNs) with

applications in many domains (Gravina et al., 2017).

Such physiological sensor networks usually cover we-

arable accelerometers, gyroscopes, pressure sensors

for body movements and applied forces, skin/chest

electrodes (for electrocardiogram (ECG), electromyo-

gram (EMG), galvanic skin response (GSR), and

electrical impedance plethysmography (EIP)), (PPG)

Gollan, B., Haslgruebler, M., Ferscha, A. and Heftberger, J.

Making Sense: Experiences with Multi-Sensor Fusion in Industrial Assistance Systems.

DOI: 10.5220/0007227600640074

In Proceedings of the 5th International Conference on Physiological Computing Systems (PhyCS 2018), pages 64-74

ISBN: 978-989-758-329-2

sensors, microphones (for voice, ambient, and he-

art sounds) and scalp-placed electrodes for electroen-

cephalogram (EEG) (Gravina et al., 2017). These we-

arable sensor types can also be enriched with infra-

structural, remote sensor systems such as traditional

(RGB) and depth cameras.

Sensor networks are investigated in and employed

by industrial applications (Li et al., 2017), speciﬁcally

in domains such as the Automotive Industry (Mara-

belli et al., 2017), (Otto et al., 2016), healthcare IOT

(Baloch et al., 2018), (Chen et al., 2017) or food in-

dustry (Kr

oger et al., 2016), in industrial use cases as

welding (Gao et al., 2016) or CNC-machining (Jovic

et al., 2017).

1.2 Contribution of this Work

This work introduces an industrial assistance system

which is based on the integration of various sensors

which have been applied and evaluated regarding their

applicability and suitability in an industrial applica-

tion. In this context, this work presents an overview of

the investigated sensors with reviews and experiences

regarding data quality, reliability, etc. Furthermore,

this work reports on the key challenges and opportu-

nities which are (i) handling of big amounts of data in

real-time, (ii) ensuring interoperability between diffe-

rent systems, (iii) handling uncertainty of sensor data,

and the general issues of (iv) multi-sensor fusion.

While Section 2 describes the industrial applica-

tion scenario, in Sections 3 and 4 the respective sen-

sors are introduced. Section 5 puts the focus on

the discussions of challenges and opportunities and

section 6 provides a summary and addresses future

work.

2 INDUSTRIAL APPLICATION

SCENARIO

The industrial application scenario is an industrial as-

sistance system which is employed in a semi-manual

industrial application of a complex assembly of pre-

mium alpine sports products, where it is supposed

to ensure the high-quality requirements by providing

adaptive worker support.

The work task consists of manually manipulating

and arranging the multiple parts whereas errors can

occur regarding workﬂow order, object orientation, or

omission of parts. These errors express in unaccepta-

ble product quality differences regarding usage cha-

racteristics (e.g. stability, stiffness), thus increase re-

jects and inefﬁciency.

Figure 1: Ski assembly procedure and environment.

Full automation of the process is not feasible

due to (i) required high ﬂexibility (minimal lot si-

zes, changing production schedules), (ii) used mate-

rial characteristics (highly sticky materials) and (iii)

human-in-the-loop production principles enable the

optimization of product quality and production pro-

cesses.

In this context, the sensor-based assistance system

is designed to enable the realization of an adaptive,

sensitive assistance system as to provide guidance

only if needed, thus minimizing obtrusiveness and

enabling the assistance system to seamlessly disap-

pear into the background. Furthermore, the adapti-

vity of the feedback design enables the education of

novices in training-on-the-job scenarios, integrating

novices directly into the production process during

their one month training period without occupying

productive specialists.

The assistance system is supposed to observe the

task execution, identify the associated step in the

workﬂow and identify errors or uncertainty (hesita-

tion, deviation from work plan, etc.) to support the

operator via different levels of assistance (Haslgr

ubler

et al., 2017). The selection of assistance depends on

operator skill (i.e. day 1 trainee vs 30-year-in-the-

company worker), cognitive load and perception ca-

pability to provide the best possible assistance with

the least necessary disruption. Such supportive mea-

sures range from, laser-based markers for part place-

Making Sense: Experiences with Multi-Sensor Fusion in Industrial Assistance Systems

ment or visual highlighting of upcoming work steps

in case of uncertainty, to video snippets visualizing

the correct execution of a task, in case of doubt.

3 ACTIVITY SENSING

The most common application of activity and beha-

vior analysis in industrial applications is monitoring

of task progress for documentation or assistance ap-

plications. The main kinds of sensors and technolo-

gies that can be exploited for activity tracking are (i)

stationary (visual) sensors and (ii) wearable motion

sensors. The different ﬁelds of application are intro-

duced in the following, for an overview please refer

to Table 1:

3.1 Skeleton Tracking

Mainly stationary visual sensors are employed to

identify body joints and the resulting associated skele-

ton pose. Depending on the application, these sensors

address the full skeleton or sub-selections of body

joints.

3.1.1 Full Skeleton Tracking

Sensor Description - Kinect v2. The Microsoft Ki-

nect v2 combines an infrared and an RGB camera to

track up to six complete skeletons, each consisting of

26 joints. The Kinect uses an infrared time-of-ﬂight

technology to build a 3D map of the environment and

the objects in view. Skeleton data is provided by the

associated Microsoft SDK which is restricted to Mi-

crosoft Windows platforms.

In the described application scenario, two Kinect

cameras have been installed on opposing sides of the

work environment - as a frontal positioning was not

possible - to avoid obstructions and enable an en-

compassing perception of the scene. Based on a ma-

nual calibration of the two sensors, the data is combi-

ned into a single skeleton representation via a multi-

sensor fusion approach as described in Section 5.4.

The calibration is achieved via a two-step process: (1)

real-world measurement of the placement and orien-

tation angle of the sensors in the application scenario,

obtaining the viewpoints of the two sensors in a joint

coordinate system and (2) ﬁne adjustment based on

skeleton joints that are observed at the same time, at

different positions. For this purpose, the head joint

was chosen as it represents the most stable joint of

the Kinect tracking approach - according to our expe-

rience. The overall result of the calibration approach

is the localization and orientation of the two sensors

in a joint coordinate system, thus enabling the overlay

and fusion of the respective sensor input data.

Evaluation. Kinect-like sensors provide unique op-

portunities of skeleton tracking, thus overcome

many problems associated with professional motion

tracking systems such as enabling (i) markerless

tracking, (ii) fast and simple setup and (iii) low-cost

tracking results. However, due to the infrared techno-

logy, the depth sensors do not perform well in outdoor

settings with high infrared background noise. Furt-

hermore, the cameras require good allocation of the

scene, with a full view of the worker for best tracking

results.

Overall, the application of Kinect sensors in in-

dustrial applications requires careful handling and

substantial data post-processing. With the Kinect ske-

leton data showing large amounts of ﬂuctuations, the

Kinect represents a cheap, yet not per se reliable sen-

sor for skeleton tracking.

3.1.2 Sub-Skeleton Tracking

Sensor Description - Leap Motion. Aiming only

at tracking the hands of a user, speciﬁcally in Virtual

Reality applications, the Leap Motion controller re-

presents an infrared, stereo-vision-based gesture and

position tracking system with sub-millimeter accu-

racy (Weichert et al., 2013). Suitable both for mobile

and stationary application, it has been speciﬁcally de-

veloped to track hands and ﬁngers in a close distance

of up to 0.8 m, enabling highly accurate hand gesture

control of interactive computer systems.

In the introduced industrial application scenario,

the Leap Motion controllers are installed in the focus

areas of the assembly tasks, trying to monitor the de-

tailed hand movements.

Evaluation. The Leap Motion controller shows

high accuracy and also high reliability. Yet, unfor-

tunately, the sensor shows a high latency in the initial

registration of hands (up to 1-2 s). In a highly dyna-

mic application as in the presented use-case scenario,

this latency prevented the applicability of the Leap

Motion sensor, as the hands were often already lea-

ving the area of interaction when they were detected.

For this purpose, this highly accurate sensor could not

be applied in the ﬁnal assistance setup, yet they repre-

sent a very interesting sensor choice when addressing

a very stationary industrial task.

3.1.3 Joint Tracking

Mobile, wearable sensors are used to extract the mo-

vement of single body joints, most commonly the

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

Audio Source

Video Source

Depth Source

Infrared Source

resolution

Framerate

Color space

Accel. Source

bitrate

Freq. range

GSR Source

Sequencing

Filter/Proc

HRV Source

Pupil Dilation Source

Eyegaze Source

6DOF Data

Framerate

Color space

resolution

Framerate

Color space

Gaze Dir

Move Vec

Pupil Dilation

Raw GSR

Data

Feat. Comp.

Classification

Feat. Comp.

Classification

Feat. Comp.

Activity

Cognitive Load

Cognitive Model of

Attention

Perception

Workflow

Modeling

Skill

Modeling

Feedback

Deployment

Documentation/

Perception / Awareness

Understanding /

Modeling

Reasoning /

Decision-Making

Autonomous

Acting

Machine Learning

Sequencing

Filter/Proc

Sequencing

Filter/Proc

Sequencing

Filter/Proc

Sequencing

Filter/Proc

Sequencing

Filter/Proc

Sequencing

Filter/Proc

Sequencing

Filter/Proc

Eyetracker PupilLabs' Pupil

features a FullHD world

camera with a replaceable

60/100 degrees diagonal

lens and two IR Cameras

capturing eye movement at

120fps.

Motion Tracker Shimmers

Shimmer3 contains a 9DOF

Inertial Tracking Unit consists

of a 3-Axis Accelerometer,

Gyroscope & Magnetometer

Physiological Sensor

Empaticas E4 tracks its

wearers heart rate sensor at

64Hz, galvanic skin response

and peripheral skin

temperature at 4Hz.

RGB-D Sensor Microsofts

Kinect 2 features a FullHD

real world camera and

512x424 pixel time-of-flight

IR Camera for depth sensing

ranging between 1.8 and 8

meters.

Sensors

resolution

Figure 2: Scheme of the introduced industrial multi-sensor assistance system with the various level of abstractions: Perception,

Understanding, Reasoning, Acting. Data from Sensors are processed individually and in aggregated form to perform activity,

work-step, skill and cognitive recognition. Reasoning Models are then used to select appropriate assistance measure via

different actors.

wrists for inference on hand movement activity. The

vast majority of these sensors are based on accelero-

meters and gyrometers to provide relative changes in

motion and orientation for behavior analysis.

Sensor Description - Shimmer. The Shimmer sen-

sors have already been validated for use in acade-

mic and industrial research applications (Burns et al.,

2010), (Gradl et al., 2012), (Srbinovska et al., 2015)

Also, Shimmer research offers the several tools and

APIs for manipulation, integration and easy data

access. Due to their small size and lightweight (28g)

wearable design, they can be worn on any body seg-

ment for the full range of motion during all types of

tasks, without affecting the movement, techniques, or

motion patterns. Built-in inertial measurement sen-

sors are able to capture kinematic properties, such as

movement in terms of (i) Acceleration, (ii) Rotation,

(iii) Magnetic ﬁeld.

The updated module boasts a 24MHz CPU with a

precision clock subsystem and provides the three-axis

acceleration and gyrometer data. We applied a shim-

mer sensor on each of the worker’s hands to obtain

expressive manual activity data. The Shimmer sen-

sors provide their data with a frame rate of 50 Hz. In

the current scope of the implementation, hand activity

data is parsed from respective text/csv-ﬁles in which

the recorded data has been stored. This accumulates

to 6 features per iteration per sensor (3x gyrometer,

3x accelerometer) every 20 ms.

Evaluation. Shimmer sensors provide reliable and

accurate tracking data, also in rough industrial envi-

ronments. The real-time analysis requires a smartp-

hone as a transmission device, yet does work reliably.

Overall, when aiming for raw accelerometer data, the

Shimmer sensor platforms have proven their suitabi-

lity.

3.2 Gesture Detection

The introduced Kinemic sensor is closely related to

the previously described accelerometer sensors pla-

ced on the wrist of the worker. Yet, it does not pro-

vide access to the raw accelerometer data but directly

provides only higher level gesture detections as re-

sult to the system. Due to this reason, the distinction

between general joint tracking and hand gesture de-

tection was made.

Sensor Description - Kinemic. The Kinemic wrist-

band sensor for hand gesture detection is a new sensor

for which almost no ofﬁcial information is available.

It is based on - presumably - 3-axis accelerometer and

gyrometer sensor and connected to a mobile computa-

tion platform (RaspberryPi) which carries out the ge-

sture detection processes. Currently, 12 gestures are

supported, with the goal to expand to customizable

gestures, air writing, etc.

Evaluation. The sensors are easily initiated and in-

tegrated into a multi-sensor system. The recognition

of the gestures works well for the majority of existing

Making Sense: Experiences with Multi-Sensor Fusion in Industrial Assistance Systems

gestures. In summary, this sensor with the associa-

ted SDK provides a useful solution for people looking

for high-level off-the-shelf gesture interaction, wit-

hout requiring access to raw accelerometer data.

3.3 Behavior Analysis

3.3.1 Gaze-based Task Segmentation

The analysis of gaze behavior also provides interes-

ting insights into the execution of activities, especially

the segmentation of subsequent tasks in a work pro-

cess. Recent work shows that the gaze feature Nearest

Neighbour Index (Camilli et al., 2008), which descri-

bes the spatial distribution of ﬁxations in a dynamic

environment (Amrouche et al., 2018). Employing a

wearable Pupil Labs eye tracker, this gaze behavior

feature was capable of successfully segmenting and

recognizing tasks. For the sensor discussion, please

refer to section 4.1.1.

4 SENSING OF COGNITIVE

STATES

4.1 Visual Attention

Generally, the human eye gaze represents the most

efﬁcient and fastest, consciously controlled form of

information acquisition with the unique capability to

bridge large distances. Intuitively, the human eye is

mainly responsible for the positioning of eye gaze,

thus represent an expression for stimulus selection,

yet, ﬁne details of gaze behavior also show connecti-

ons to conscious and subconscious information pro-

cessing mechanisms that allow inferences on internal

attention processes.

4.1.1 Gaze Behavior

Sensor Description - Pupil Labs. the PupilLabs

mobile eye tracker is realized as a modular and open

source solution, providing direct access to all sensors

and data streams (gaze position, gaze orientation, sac-

cade analysis, pupil dilation, etc.), rendering the de-

vice as more suitable for academic research applicati-

ons. The PupilLabs eye tracker enables direct access

in real-time to all parameters and tracking results. The

PupilLabs device provides the eye tracking data for

each eye with a distinct timestamp, requiring additio-

nal synchronization of obtained data frames.

Evaluation. The PupilLabs eye tracker provides

rather simple and encompassing access to basic data

streams. As a consequence, the PupilLabs eye tracker

is a suitable, low-cost device for ambitious developers

that want to develop algorithms based on the raw sen-

sor data. However, the sensor fails in outdoor environ-

ments when exposed to scattered infrared light. In the

proposed application scenario, the PupilLabs eye trac-

ker is employed for associating gaze orientation to ob-

jects in space (hands, task-relevant objects, etc.) via

object recognition in the ﬁrst person video. However,

the achieved results are always situated in the user-

speciﬁc coordinates, which, to be associated with an

overall world space of the industrial shop ﬂoor requi-

res a complex and detailed localization of the worker,

regarding both head location and orientation.

4.1.2 Visual Focus of Attention

The general spatial allocation of attention can also be

assessed on a less-ﬁne-grained level via external, in-

frastructural sensors. The so-called visual focus of

attention has found sustained application in human-

computer-interaction applications. These differ in ap-

plication and tracking technology but all use head

orientation as the key information for attention orien-

ting (Asteriadis et al., 2009), (Smith et al., 2006),

(Leykin and Hammoud, 2008).

Sensor Description - Kinect v2. As described

above, the Kinect provides a quite reliable skeleton

information on a low-cost platform. It also provides

joint orientation, yet not head orientation. To exploit

the available data for the estimation of the visual fo-

cus of attention, an approximation of shoulder axis

and neck-head axis can be employed.

Evaluation. The visual focus of attention data deri-

ved from this approach can only provide very rough

information on the actually perceived objects and

areas in space. However, it directly provides the spa-

tial context, which misses in the assessment via we-

arable eye trackers, as described above. Hence, the

combination of the two sensors, wearable and infra-

structural, may help in providing substantial advances

in the task of 3D-mapping of visual attention in in-

dustrial environments - a task which will be pursued

in future work.

4.2 Arousal

In the literature, arousal is deﬁned by Kahneman

(Kahneman, 1973) as general activation of mind, or

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

as general operation of consciousness by Thatcher

and John (Thatcher and John, 1977).

Psychophysiological measures exploit these phy-

sical reactions of the human body in the preparation

of, execution of, or as a reaction to cognitive activi-

ties. In contrast to self-reported or performance me-

asures, psychophysiological indicators provide conti-

nuous data, thus allowing a better understanding of

user-stimulus interactions as well as non-invasive and

non-interruptive analysis, maybe even outside of the

scope of the users consciousness. Whereas these me-

asures are objective representations of ongoing cogni-

tive processes, they often are highly contaminated by

reactions to other triggers, as e.g. physical workload

or emotions.

4.2.1 Cognitive Load

Besides light incidence control, the pupil is also sen-

sitive to psychological and cognitive activities and

mechanisms, as the musculus dilatator pupillae is di-

rectly connected to the limbic system via sympathe-

tic control (Gabay et al., 2011), hence, the human

eye also represents a promising indicator of cognitive

state. Currently, existing analysis approaches towards

analysis of cognitive load from pupil dilation - Task-

Evoked Pupil Response (TEPR) (Gollan and Ferscha,

2016) and Index of Cognitive Activity (ICA) (Kra-

mer, 1991) - both ﬁnd application mainly in labora-

tory environments due to their sensitivity to changes

in environment illumination.

Sensor Description - PupilLabs. The employed

Pupil Labs mobile eye tracker provides pupil diameter

as raw measurement data, both in relative (pixel size)

as in absolute (mm) units due to the freely positiona-

ble IR eye cameras. The transformation is achieved

via a 3D model of the eyeball and thus an adaptive

scaling of the pixel values to absolute mm measure-

ments.

Evaluation. The assessment of pupil dilation works

as reliably as the gaze localization with the lack of

ofﬁcial accuracy measures in comparative studies.

Hence, it is difﬁcult to evaluate the sensor regarding

data quality. Overall, the assessment of pupil dilation

with the mobile Pupil Labs eye tracker provides reli-

able data, for laboratory studies or ﬁeld application.

Erroneous data like blinks needs to be ﬁltered in post-

processing of the raw data.

4.2.2 Cardiac Indicators

The cardiac function, i.e. heart rate, represents anot-

her fundamental somatic indicator of arousal and thus

of attentional activation as a direct physiological re-

action to phasic changes in the autonomic nervous sy-

stem (Graham, 1992). Heart Rate Variability (HRV),

heart rate response (HRR) or T-Wave amplitude ana-

lysis are the most expressive physiologic indicators of

arousal (Suriya-Prakash et al., 2015), (Lacey, 1967).

The stationary and mobile assessment of cardiac

data is very well established in medical as well as

customer products via diverse realizations of ECG

sensors. The different sensors are based on two

main independent measurement approaches: (i) me-

asuring the electric activity of the heart over time via

electrodes that are placed directly on the skin and

which detect minimal electrical changes from the he-

art muscle’s electro-physiologic pattern of depolari-

zing during each heartbeat; and (ii) measuring the

blood volume peak of each heartbeat via optical sen-

sors (pulse oximeters) which illuminates the skin and

measures the changes in light absorption to capture

volumetric changes of the blood vessels (Photoplet-

hysmography (PPG)).

Sensor Description - Shimmer. Shimmer sensors

use a photoplethysmogram (PPG) which detects the

change in volume by illuminating the skin with the

light from a light-emitting diode (LED) and then me-

asuring the amount of light transmitted or reﬂected

towards a photodiode. From this volume changes an

estimate of heart rate can be obtained.

Sensor Description - Empatica E4. The E4 wris-

tband allows two modes of data collection: (i) in-

memory recording and (ii) live streaming of data.

Accessing in-memory recorded data requires a USB

connection to a Mac or Windows PC running Em-

patica Manager Software for a posteriori analysis.

Accessing streaming data for real-time analysis of so-

matic data, the Empatica Real-time App can be instal-

led from the Apple App Store or Google Play Market

onto a smartphone device via Bluetooth on which the

data can be processed or forwarded. Additionally, a

custom application can be implemented for Android

and iOS systems.

Sensor Description - Microsoft Band 2. The Mi-

crosoft Band 2 is equipped with an optical PPG sensor

for analysis of pulse. With the Microsoft Band repre-

senting an end-user product, the focus in the provided

functionality is not set on providing most accessible

interfaces for academic purposes, yet, still, the availa-

ble SDK enables the access of raw sensor data in real-

time. For data access, the sensor needs to be paired

with a smartphone device and data can be transferred

Making Sense: Experiences with Multi-Sensor Fusion in Industrial Assistance Systems

via a Bluetooth connection for either direct proces-

sing on the mobile device or further transmission to a

general processing unit.

Evaluation. The Microsoft Band is highly re-

stricted in sensor placement as the sensor is integrated

into the wristband of the device and thus measures the

skin response on the bottom surface of the wrist. In

experiments, the Microsoft Band sensor showed large

drops in measurement data, most probably due to a

change of contact between the sensor and the skin du-

ring device shifts. In contrast, the Shimmer Sensing

Platform allows much more freedom in the placement

of the sensor with the help of external sensing mo-

dules e.g. pre-shaped for mounting on ﬁngers which

show the most promising locations for reliable GSR

measurements.

Accessing real-time data for the E4 wristband

shows similar comfort levels as the Microsoft Band

as the device needs to be paired with a smartphone

device and data can be transferred via a Bluetooth

connection for either direct processing on the mobile

device or further transmission to a general processing

unit. Being designed for research and academic pur-

poses, the Shimmer platform provides easiest and fas-

test access via open and intuitive interfaces. Overall,

the data from all devices can be accessed in real-time,

yet the destinated applications of the products resem-

ble in their applicability in research and development

approaches.

4.2.3 Galvanic Skin Response

From the very early 1900s, the Galvanic Skin Re-

sponse has been the focus of academic research. The

skin is the only organ that is purely innervated by

the sympathetic nervous system (and not affected

by parasympathetic activation). The GSR analyzes

the electrodermal activity (EDA) of the human skin

which represents an automatic reﬂection of synaptic

arousal as increased skin conductance shows signiﬁ-

cant correlations with neuronal activities (Frith and

Allen, 1983), (Critchley et al., 2000). Hence, Galva-

nic Skin Response (GSR) acts as an indicator of arou-

sal and increases monotonically with attention in task

execution (Kahneman, 1973).

Sensorial Assessment. The accessibility of the raw

and real-time data depends on the respective develop-

ment environment which is provided to support these

sensors, ranging from a general limitation to statisti-

cal information to access of true real-time data.

The GSR can be assessed via mobile, wearable

sensors worn on the bare skin, e.g., as integrated into

activity trackers or smartwatches or scientiﬁc activity

and acceleration sensors. These sensors measure the

skin conductance, i.e. skin resistivity via small inte-

grated electrodes. The skin conductance response is

measured from the eccrine glands, which cover most

of the body and are especially dense in the palms and

soles of the feet. In the following, three wearable sen-

sors are explored which provide the analysis of skin

conductance response:

Evaluation

E4 Wristband: is a hand wearable wireless devices

designated for continuous, real-time data acquisi-

tion of daily life activities. It is speciﬁcally de-

signed in an extremely lightweight (25g) watch-

like form factor that allows hassle-free unobtru-

sive monitoring in- or outside the lab. With the

built-in 3-axis accelerometer sensor the device is

able to capture motion-based activities. Additio-

nally, the device is able to capture the following

physiological features (i) Galvanic skin response

(ii) Photoplethysmography (heart rate) (iii) In-

frared thermophile (peripheral skin temperature).

The employed Empatica E4 Wristband has alre-

ady found application in various academic rese-

arch applications and publications (van Dooren

et al., 2012), (Fedor and Picard, 2014).

Microsoft Band 2: offers an affordable mean for

tracking a variety of parameters of daily living.

Besides 11 advanced sensors for capturing mo-

vement kinematics, physical parameters and en-

vironmental factors the device also offers various

channels for providing feedback. A 1.26 x 0.5-

inch curved screen with a resolution of 320 x 128

pixels can be used to display visual messages. Ad-

ditionally, a haptic vibration motor is capable of

generating private vibration notiﬁcations.

Shimmer: sensors have already been validated for

use in biomedical-oriented research applications.

Due to their small size and lightweight (28g) we-

arable design, they can be worn on any body seg-

ment for the full range of motion during all types

of tasks, without affecting the movement, techni-

ques, for motion patterns. Built-in inertial measu-

rement sensors are able to capture kinematic pro-

perties, such as movement in terms of (i) Accele-

ration, (ii) Rotation, (iii) Magnetic ﬁeld.

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

Table 1: Overview on introduced sensors, grouped according to their sensing category and analysis type, listing the associated

technologies and sensor parameters.

Category Type Sensor Name Technology Accuracy / Range

Activity

Skeleton

Full Skeleton Microsoft Kinect v2

Time-of-Flight

Infrared

Depth: 512x424 @ 30 Hz

FOV: 70

◦

x 60

◦

RGB: 1920x1080 @ 30 Hz

FOV: 84

◦

x 53

◦

acc: 0.027m, (SD: 0.018m)

depth range: 4m

Sub-Skeleton

Leap Motion

(Potter et al., 2013)

Stereo-Vision

Infrared

hand tracking

FOV: 150

◦

x 120

◦

avg error: < 0.0012m

(Weichert et al., 2013)

depth range: 0.8m

Joint Tracking

(wrist)

Shimmer

3-axis accelerometer

gyrometer

Range: ±16g

Sensitivity: 1000 LSB/g at ±2g

Resolution: 16 bit

Gesture

Hand Gesture Kinemic

3-axis accelerometer

gyrometer

not available

Behavior

Analysis

Gaze Behavior

Pupil Labs

(Kassner et al., 2014)

Mobile Eyetracker

gaze feature analysis

for task segmentation

accuracy 91%

(Amrouche et al., 2018)

Cognitive States

Visual

Attention

Gaze Behavior

Pupil Labs

(Kassner et al., 2014)

Mobile Eyetracker

ﬁxations, saccades,

gaze features

Gaze acc: 0.6

◦

Sampling Rate: 120 Hz

Scene Camera: 30 Hz @ 1080p

60 Hz @ 720p

120 Hz @ VGA

Calibration: 5-point, 9 point

Visual Focus

of Attention

Microsoft Kinect v2

Head Orientation from

Skeleton Tracking

not available

Arousal

Cognitive Load Pupil Labs

Mobile Eyetracker

pupil dilation

pupil size in pixel

or mm via 3D model

acc. not available

Heart Rate

(HRV, HRR)

Microsoft Band 2 PPG

avg. error rate: 5.6%

(Shcherbina et al., 2017)

Empatica E4 Wristband

(Poh et al., 2012)

PPG

samping frequency: 64 Hz

error rate: 2.14%

Galvanic Skin

Response

Microsoft Band 2

data rate: 0.2/5 Hz

acc. not available

Empatica Wristband

Empatica E3 EDA

proprietary design

data rate: 4 Hz

mean cor. to reference

0.93, p < 0.0001

(Empatica, 2016)

Making Sense: Experiences with Multi-Sensor Fusion in Industrial Assistance Systems

5 CHALLENGES &

OPPORTUNITIES

5.1 Summary

In the previous chapters, several sensors have been

described regarding their underlying technology,

access to sensor data and evaluation regarding suit-

ability for academic or industrial exploitation. As an

overview, a short fact summary of the information is

collected in Table 1, including further numerical data

regarding the accuracy and range of the sensors, if

available.

5.2 Handling Amounts of Data

The ﬁrst challenge in the analysis of multi-sensor ap-

plications is the handling of the amounts of data, usu-

ally with real-time requirements. This applies both

the required levels of computational performance as

well as to further hardware assets as BUS bandwidth

or hard drive access speed.

But also the ofﬂine handling of the data may re-

present problems for the design of interactive systems

as especially raw video data - when stored - quickly

exceeds GigaBytes of data. These amounts of data

need to be managed, if possible in suitable database

structures to enable efﬁcient further processing of re-

corded data.

Other than data transfer and storage, also human

resources for post-processing of the data represents

a substantial challenge. This implies checking, ﬁlte-

ring data, extracting relevant segments of data, etc.

Especially - when aiming for supervised machine le-

arning tasks - the manual labeling of activities repre-

sents an effort which often substantially exceeds the

actual time of collected data and needs to be consi-

dered in the application setup. This labeling can be

improved via suitable software solutions that enable

the review and direct labeling of multimodal data stre-

ams.

5.3 Interoperability, Interfaces,

Operating Platforms

Besides the pure amount of data, the different sour-

ces and interfaces represent a further source of pro-

blems. Depending on the producer, the analysis of the

sensors requires speciﬁc supported frameworks and

development environments. Mobile sensors are of-

ten associated with Android apps for mobile data col-

lection and transfer, or e.g. the Microsoft Kinect sen-

sors require Microsoft Windows platforms for opera-

tion, etc.

Creating a multi-sensor industrial application re-

quires the multi-platform capability of development

staff and often the creation of distributed systems ope-

rating on different native platforms. In the presen-

ted industrial application, such a distributed set of

platforms is employed, inter-connected with a cross-

platform messaging solution, thus overcoming the in-

teroperability issue.

5.4 Multi-sensor Fusion

In many industrial applications, no single sensor is

suitable to cover the overall complexity of a situa-

tion. Furthermore, no sensor provides perfect data,

so redundant sensor designs enable the compensation

of sensor failure. However, the handling of parallel,

multi-modal datastreams provides several issues re-

garding data processing and system design, as discus-

sed in the following paragraphs.

5.4.1 Synchronization & Subsampling

The synchronization of different sensor types repre-

sents a substantial problem, especially of non-visual

sensors (accelerometers, etc.). It is advisable to in-

troduce a synchronizing activity which is unambigu-

ously identiﬁable in diverse data representations. In

the introduced industrial application, a single hand

clap has proved to provide useful data for synchro-

nization as it shows explicit peaks in motions sensing

and can precisely be timed also in visual and auditory

sensors.

However, a single synchronization is usually not

sufﬁcient. Different sampling rates from the diverse

sensor types require a sub-or re-sampling of data to

combine single data snippets into collected data fra-

mes which are able to provide an overall representa-

tion of the scene over the various available sensors.

Sometimes, when recording long sessions (¿1 hour),

the differences in the internal clocks of the sensors

may also cause signiﬁcant shifts in the data, making

re-synchronization in periodic time ranges advisable.

5.4.2 Dealing with the Uncertainty of Sensor

Data

One of the most critical and difﬁcult aspects of multi-

modal sensor applications is the evaluation of data

quality as this directly affects fusion of different data

types. Some sensors directly provide measures of

conﬁdence of sensor data, while others require hand-

made post-processing for the evaluation of data qua-

lity. These can range from rule-based evaluation crite-

ria as application-based plausibility checks (e.g. avoi-

ding jitter in hand tracking data by limiting the maxi-

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

mal distance between consecutive data frames) to sta-

tistical measures (check if data lies in the standard va-

lue range) or comparison of actual data with predicti-

ons from previous frames.

Such evaluation of data quality is required to dy-

namically select the sensors with the currently best

and most reliable sensor data, hence is the main pre-

requisite for the fusion of redundant sensor data.

5.4.3 Fusion of Redundant Data

Based on an evaluation of incoming sensor data qua-

lity, the different data types can be merged via dif-

ferent weights based on the respective sensor data

conﬁdence. In the proposed application-scenario, a

Kalman-Filter was used to combine skeleton data

from two Kinect sensors and an RGB image sensor

to calculate a merged, stabilized user skeleton for the

adjacent behavior analysis approach.

6 CONCLUSION AND FUTURE

WORK

In this paper, various sensors for the analysis of acti-

vities and cognitive states are introduced in the speci-

ﬁc case of an industrial, semi-manual assembly sce-

nario. The proposed sensors range from image- and

depth-image-based infrastructural sensors to body-

worn sensors of somatic indicators of behavior and

cognitive state. For all sensors, a general description

and evaluation regarding the experiences in the des-

cribed industrial use-case have been provided, trying

to help other researchers in their selection of suitable

sensors for their speciﬁc research question.

The sensor discussion is followed with a general

description of issues and challenges of sensors in in-

dustrial application scenarios, with a special focus on

multi-sensor fusion.

The goal of future work is to realize a truly oppor-

tunistic sensor framework which dynamically can add

and select sensors which provide the best data for the

current application.

ACKNOWLEDGEMENTS

This work was supported by the projects Attentive

Machines (FFG, Contract No. 849976) and At-

tend2IT (FFG, Contract No. 856393). Special thanks

to team members Christian Thomay, Sabrina Amrou-

che, Michael Matscheko, Igor Pernek and Peter Fritz

for their valuable contributions.

REFERENCES

Amrouche, S., Gollan, B., Ferscha, A., and Heftberger, J.

(2018). Activity segmentation and identiﬁcation ba-

sed on eye gaze features. PErvasive Technologies Re-

lated to Assistive Environments (PETRA). accepted

for publishing in June 2018.

Asteriadis, S., Tzouveli, P., Karpouzis, K., and Kollias,

S. (2009). Estimation of behavioral user state ba-

sed on eye gaze and head poseapplication in an e-

learning environment. Multimedia Tools and Appli-

cations, 41(3):469–493.

Baloch, Z., Shaikh, F. K., and Unar, M. A. (2018).

A context-aware data fusion approach for health-

iot. International Journal of Information Technology,

10(3):241–245.

Behrmann, E. and Rauwald, C. (2016). Mercedes boots

robots from the production line. Accessed: 2017-02-

01.

Burns, A., Greene, B. R., McGrath, M. J., O’Shea, T. J.,

Kuris, B., Ayer, S. M., Stroiescu, F., and Cionca,

V. (2010). Shimmer–a wireless sensor platform for

noninvasive biomedical research. IEEE Sensors Jour-

nal, 10(9):1527–1534.

Camilli, M., Nacchia, R., Terenzi, M., and Di Nocera, F.

(2008). Astef: A simple tool for examining ﬁxations.

Behavior research methods, 40(2):373–382.

Chen, M., Ma, Y., Li, Y., Wu, D., Zhang, Y., and Youn, C.-

H. (2017). Wearable 2.0: Enabling human-cloud in-

tegration in next generation healthcare systems. IEEE

Communications Magazine, 55(1):54–61.

Critchley, H. D., Elliott, R., Mathias, C. J., and Dolan, R. J.

(2000). Neural activity relating to generation and re-

presentation of galvanic skin conductance responses:

a functional magnetic resonance imaging study. Jour-

nal of Neuroscience, 20(8):3033–3040.

Empatica (2016). comparison procomp vs empatica e3 skin

conductance signal.

Fedor, S. and Picard, R. W. (2014). Ambulatory EDA: Com-

parisons of bilateral forearm and calf locations.

Frith, C. D. and Allen, H. A. (1983). The skin conductance

orienting response as an index of attention. Biological

psychology, 17(1):27–39.

Gabay, S., Pertzov, Y., and Henik, A. (2011). Orienting of

attention, pupil size, and the norepinephrine system.

Attention, Perception, & Psychophysics, 73(1):123–

129.

Gao, X., Sun, Y., You, D., Xiao, Z., and Chen, X. (2016).

Multi-sensor information fusion for monitoring disk

laser welding. The International Journal of Advanced

Manufacturing Technology, 85(5-8):1167–1175.

Gollan, B. and Ferscha, A. (2016). Modeling pupil dila-

tion as online input for estimation of cognitive load

in non-laboratory attention-aware systems. In COG-

NITIVE 2016-The Eighth International Conference on

Advanced Cognitive Technologies and Applications.

Gradl, S., Kugler, P., Lohm

uller, C., and Eskoﬁer, B. (2012).

Real-time ecg monitoring and arrhythmia detection

using android-based mobile devices. In Engineer-

ing in Medicine and Biology Society (EMBC), 2012

Making Sense: Experiences with Multi-Sensor Fusion in Industrial Assistance Systems

Annual International Conference of the IEEE, pages

2452–2455. IEEE.

Graham, F. K. (1992). Attention: The heartbeat, the blink,

and the brain. Attention and information processing

in infants and adults: Perspectives from human and

animal research, 8:3–29.

Gravina, R., Alinia, P., Ghasemzadeh, H., and Fortino, G.

(2017). Multi-sensor fusion in body sensor networks:

State-of-the-art and research challenges. Information

Fusion, 35:68–80.

Haslgr

ubler, M., Fritz, P., Gollan, B., and Ferscha, A.

(2017). Getting through: modality selection in a

multi-sensor-actuator industrial iot environment. In

Proceedings of the Seventh International Conference

on the Internet of Things, page 21. ACM.

Jovic, S., Anicic, O., and Jovanovic, M. (2017). Adaptive

neuro-fuzzy fusion of multi-sensor data for monito-

ring of cnc machining. Sensor Review, 37(1):78–81.

Kahneman, D. (1973). Attention and effort, volume 1063.

Prentice-Hall Enlegwood Cliffs, NJ.

Kassner, M., Patera, W., and Bulling, A. (2014). Pupil: an

open source platform for pervasive eye tracking and

mobile gaze-based interaction. In Proceedings of the

2014 ACM international joint conference on perva-

sive and ubiquitous computing: Adjunct publication,

pages 1151–1160. ACM.

Kramer, A. F. (1991). Physiological metrics of mental wor-

kload: A review of recent progress. Multiple-task per-

formance, pages 279–328.

oger, M., Sauer-Greff, W., Urbansky, R., Lorang, M.,

and Siegrist, M. (2016). Performance evaluation on

contour extraction using hough transform and ransac

for multi-sensor data fusion applications in industrial

food inspection. In Signal Processing: Algorithms,

Architectures, Arrangements, and Applications (SPA),

2016, pages 234–237. IEEE.

Lacey, J. I. (1967). Somatic response patterning and stress:

Some revisions of activation theory. Psychological

stress: Issues in research, pages 14–37.

Leykin, A. and Hammoud, R. (2008). Real-time estima-

tion of human attention ﬁeld in lwir and color surveil-

lance videos. In Computer Vision and Pattern Recog-

nition Workshops, 2008. CVPRW’08. IEEE Computer

Society Conference on, pages 1–6. IEEE.

Li, X., Li, D., Wan, J., Vasilakos, A. V., Lai, C.-F., and

Wang, S. (2017). A review of industrial wireless net-

works in the context of industry 4.0. Wireless net-

works, 23(1):23–41.

Marabelli, M., Hansen, S., Newell, S., and Frigerio, C.

(2017). The light and dark side of the black box:

Sensor-based technology in the automotive industry.

CAIS, 40:16.

Otto, M. M., Agethen, P., Geiselhart, F., Rietzler, M., Gais-

bauer, F., and Rukzio, E. (2016). Presenting a holistic

framework for scalable, marker-less motion capturing:

Skeletal tracking performance analysis, sensor fusion

algorithms and usage in automotive industry. Journal

of Virtual Reality and Broadcasting, 13(3).

Poh, M.-Z., Loddenkemper, T., Reinsberger, C., Swenson,

N. C., Goyal, S., Sabtala, M. C., Madsen, J. R., and Pi-

card, R. W. (2012). Convulsive seizure detection using

a wrist-worn electrodermal activity and accelerometry

biosensor. Epilepsia, 53(5).

Potter, L. E., Araullo, J., and Carter, L. (2013). The leap

motion controller: a view on sign language. In Pro-

ceedings of the 25th Australian computer-human inte-

raction conference: augmentation, application, inno-

vation, collaboration, pages 175–178. ACM.

Shcherbina, A., Mattsson, C. M., Waggott, D., Salisbury,

H., Christle, J. W., Hastie, T., Wheeler, M. T., and

Ashley, E. A. (2017). Accuracy in wrist-worn, sensor-

based measurements of heart rate and energy expendi-

ture in a diverse cohort. Journal of personalized me-

dicine, 7(2):3.

Smith, K. C., Ba, S. O., Odobez, J.-M., and Gatica-Perez, D.

(2006). Tracking attention for multiple people: Wan-

dering visual focus of attention estimation. Technical

report, IDIAP.

Srbinovska, M., Gavrovski, C., Dimcev, V., Krkoleva, A.,

and Borozan, V. (2015). Environmental parameters

monitoring in precision agriculture using wireless sen-

sor networks. Journal of Cleaner Production, 88:297–

307.

Suriya-Prakash, M., John-Preetham, G., and Sharma, R.

(2015). Is heart rate variability related to cognitive

performance in visuospatial working memory? PeerJ

PrePrints.

Thatcher, R. W. and John, E. R. (1977). Functional neu-

roscience: I. Foundations of cognitive processes. La-

wrence Erlbaum.

van Dooren, M., de Vries, J. J. G. G.-J., and Janssen, J. H.

(2012). Emotional sweating across the body: compa-

ring 16 different skin conductance measurement loca-

tions. Physiology & Behavior, 106(2):298–304.

Weichert, F., Bachmann, D., Rudak, B., and Fisseler, D.

(2013). Analysis of the accuracy and robustness of the

leap motion controller. Sensors, 13(5):6380–6393.

PhyCS 2018 - 5th International Conference on Physiological Computing Systems