Simulation Framework to Train Intelligent Agents towards an Assisted

Driving Power Wheelchair for People with Disability

Giovanni Falzone, Gianluca Giuffrida

, Silvia Panicacci

, Massimiliano Donati

and Luca Fanucci

Dept. of Information Engineering, University of Pisa, Via G. Caruso 16, 56122, Pisa, Italy

luca.fanucci}@unipi.it

Keywords:

Multi-Agent System, Simulation Framework, Reinforcement Learning, Assistive Technology, Power

Wheelchair, Assisted Driving.

Abstract:

Several million people with disabilities exploit power wheelchairs for outdoor mobility on both sidewalks

and cycling paths. Especially those with upper limb motor impairments have difﬁculty reacting quickly

to obstacles along the way, creating dangerous situations, such as wheelchair crash or rollover. A possible

solution could be to equip the power wheelchair with a neural network-based assisted driving system, able to

detect, avoid or warn the users of obstacles. Therefore, a virtual environment is required to simulate the system

and then test different neural network architectures before mounting the best performing one directly on board.

In this work, we present a simulation framework to train multiple artiﬁcial intelligent agents in parallel, by

means of reinforcement learning algorithms. The agent shall follow the user’s will and identify obstacles along

the path, taking the control of the power wheelchair when the user is making a dangerous driving choice. The

developed framework, adapted from an existing autonomous driving simulator, has been used to train and test

multiple intelligent agents simultaneously, thanks to a customised synchronisation and memory management

mechanism, reducing the overall training time. Preliminary results highlight the suitability of the adapted

framework for multiple agent development in the assisted driving scenario.

1 INTRODUCTION

In the recent years, more and more interest has been

directed to the world of assistive technology, in order

to help people with disabilities in their everyday

life. Power wheelchairs, voice recognition programs,

screen readers, prosthetic and robots are only some

examples of assistive devices (US Department of

Health and Human Services, 2018; Giuffrida et al.,

2019).

Considering physical disabilities, they refer to

impairments of parts of the body, resulting in some

limitations in mobility (GPII DeveloperSpace, 2020).

Approximately 13% of the U.S. population, among

adults aged 18 and over, reported having physical

disability in 2013 (Courtney-Long et al., 2015), with

a similar prevalence in other industrialised countries

https://orcid.org/0000-0003-3306-5698

https://orcid.org/0000-0003-2628-4382

https://orcid.org/0000-0002-6063-7180

https://orcid.org/0000-0001-5426-4974

and over the years.

Power wheelchairs are surely the most signiﬁcant

example of assistive devices for mobility-impaired

people. They are wheelchairs with an electric motor

rather than manual power and they are usually driven

by the user with a joystick. Therefore, they provide a

greater degree of freedom. However, they rely heavily

on the users’ skills, which may not be sufﬁcient to

react quickly to dangerous situations, such as the

presence of static and dynamic obstacles, roadblocks

and other impediments.

Some research teams have tried to overcome

collision and obstacle avoidance problems for power

wheelchairs, exploiting autonomous driving tools

and using different sensors (e.g. self localisation

algorithm, on-board lidar, stereoscopic camera and

spherical camera) (Leaman and La, 2017; Nguyen

et al., 2013a; Nguyen et al., 2013b).

The idea of this project is to realise a smart

wheelchair with assisted driving: it should allow the

user to drive with conﬁdence during its daily life,

through obstacles (both static and dynamic), and, at

Falzone, G., Giuffrida, G., Panicacci, S., Donati, M. and Fanucci, L.

Simulation Framework to Train Intelligent Agents towards an Assisted Driving Power Wheelchair for People with Disability.

DOI: 10.5220/0010190301890196

In Proceedings of the 13th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2021) - Volume 1, pages 189-196

ISBN: 978-989-758-484-8

189

the same time, let the user control the wheelchair

whenever it will not end up in a dangerous situation.

The user is then able to drive himself, but a request

of intervention could be raised to the autonomous

driving module, elevating the autonomous driving

system to a supervisor role. Since a power wheelchair

speed is generally limited at 10km/h, the sensors

to mount fall in the short and medium range

category. Moreover, due to high cost and high

power consumption, the lidar technology (Rasshofer

and Gresser, 2005) becomes unfeasible for a smart

wheelchair. The selected sensors are then high

resolution RGB and depth cameras (Yin and Shi,

2018), together with accelerometers, gyroscopes and

tilt sensors. Obstacles can be detected using semantic

segmentation of RGB images (Dai et al., 2016). An

artiﬁcial intelligent agent based on Reinforcement

Learning (RL) allows to detect harmful situations and

control the vehicle (Caltagirone et al., 2017; Lillicrap

et al., 2015), taking in input segmentation and

depth images and the other sensors measurements.

However, RL algorithms should be trained and tested

in virtual environments and then, once the agent has

learned, moved into the real world. But the existing

simulators for wheelchairs are usually designed to

train people in driving a power wheelchair (Pinheiro

et al., 2016; Faria et al., 2014; Pithon et al., 2009),

not to train intelligent agents, and so they focus on the

user interface more than on the scalable architecture

(Sch

oner, 2018).

In this paper, we present a simulation framework

for assisted driving on power wheelchair to train

multiple synchronised intelligent agents in parallel. It

allows to test different neural networks (NNs) while

reducing training time. At the end of the training

phase, the best performing agent will be mounted on-

board the power wheelchair.

After this introduction, Section II describes

the software framework, highlighting the chosen

simulator and its adaptation for multiple agents

parallel training and for the power wheelchair

use case. Reinforcement learning algorithms are

discussed in Section III, while Section IV presents

the preliminary results. Finally, the conclusions are

drawn in Section V.

2 SOFTWARE FRAMEWORK

In order to realise an assisted driving power

wheelchair, a decisional NN trained by a RL

algorithm has been chosen to implement obstacles

avoidance in dangerous situations. Hence, before

mounting it on a prototype, an interactive simulator

has been selected and improved for:

• training multiple artiﬁcial intelligent agents;

• testing multiple artiﬁcial intelligent agents;

• supporting the dynamic of a power wheelchair;

• generating dedicated sensors;

• generating multiple and dynamic environments;

• synchronising the simulated environment w.r.t.

the sensors dynamics and power wheelchairs.

The entire system exploits a client-server architecture

with synchronous remote procedure calls (RPC)

from each Client to the Server Simulator. It is

organised on different levels of abstraction in order

to decouple both the operations of interaction with

the simulator and of management of the NNs.

Figure 1 reports an overview of the entire software

architecture, separating the three main components:

the Server Simulator, the Manager and the Agent. The

latter two compose the Client entity, which directly

communicates with the Server Simulator.

The system is designed to be scalable and

modular, allowing to run multiple Agents with

different NN implementations at the same time with a

single Server Simulator.

2.1 Server Simulator

The Server Simulator module should implement a

roadside environment in several cities, some of them

to be used for training and others for testing. It should

also provide a variety of vehicles and sensors and the

possibility to change the position of a sensor with

respect to the vehicle itself. Moreover, all the sensors

should be conﬁgurable with respect to the range of

action, the ﬁeld of view, the Signal to Noise Ratio

(SNR), etc. For these constraints, we selected the

autonomous driving simulator CARLA.

CARLA (Dosovitskiy et al., 2017) is an open-

source autonomous driving simulator used in research

to develop and evaluate autonomous agents for

automotive system. It is implemented as a layer

on top of Unreal Engine 4 (UE4) (Epic Games,

2019), which exploits NVIDIA PhysX to simulate the

physics of the environment with particular attention

to the vehicle characteristics, such as mass, dumping

factor, friction of the wheel, moment of inertia, etc.

The simulated maps focus on the urban

environment: there are many different maps

with different urban objects whose aim is to simulate

a variegate real city scenario. Within the simulator, it

is possible to conﬁgure different actors, represented

by pedestrians (e.g. children and adults, women and

men), and dynamic objects, mainly represented by

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

190

Figure 1: System architecture.

cars, trucks, motorbikes and bikes. The dynamic

objects follow the road rules, stopping to the

semaphore or waiting for people to cross the street.

The sensors available within the simulation cover

the most used ones in the autonomous driving world:

RGB camera, depth camera, radar, lidar, GPS and

Inertial Measurement Unit (IMU). Moreover, the

simulator provides the semantic segmentation of the

RGB camera and the collision sensors that detects all

the hits suffered by the vehicle.

The Server Simulator exposes the RPC APIs,

becoming a stand-alone component, running on top

of a docker container (Merkel, 2014): it gives to the

Manager the control of the simulation environment,

monitoring its parameters and the actors in the

simulated world. In this way, the Manager can

decouple the Server Simulator from the autonomous

Agent which implements the decision component.

However, CARLA is conﬁned in the simulation

of automotive vehicles with the standard interface

composed by steering, throttle and brake. We can

adapt CARLA for the simulation of a different type of

vehicle, modifying the control of the car, so that it acts

as a power wheelchair. We have simulated the power

wheelchair with a BMW Isetta, since it is small and it

has a shape similar to a bounding box. The controllers

needed to set the accelerations and the velocities of

the centre of mass of the car itself are in charge of the

Manager.

2.2 Client Overview

The Client represents a simulated power wheelchair,

which interacts with the Server Simulator and

controls the environment and the vehicle through the

Manager, together with the Agent which implements

the decisional NN. More Clients can run during the

same simulation (on a single server), allowing the

parallelisation of the Agents training.

More in detail, the Manager exploits the simulator

RPC APIs to control a car to move with the dynamics

of a power wheelchair, which uses a joystick as user

interface. In fact, the joystick controls directly the

torque of the two DC motors and maps its position

in the desired linear and angular velocity. The

linear velocity is ﬁrst limited and adjusted with two

Proportional-Integrative (PI) controllers, by acting

on the throttle and the brake, and ﬁnally set in

the simulator. In this way, we can avoid peaks

of acceleration, which could bring to simulation

inconsistencies. Instead, the angular velocity is

directly set in the simulator through RPC.

The Agent takes decisions on moving the joystick,

as a consequence of the observed state, which is

provided by the Manager. The observed state is

deﬁned as the position, the speed, the direction of

the simulated power wheelchair, the input joystick,

which expresses the user’s will, and the environment

conﬁguration (i.e. the sensors measurements)

inside the Server Simulator, in a given moment of

simulation.

The sensors mounted on the simulated wheelchair

are the RGB and Depth camera and 3 axes

accelerometer, gyroscope and tilt sensor.

Then, the NN inputs are the semantic

segmentation derived from the RGB image, the

depth image, the acceleration on x, y and z axes, the

roll and pitch angles and the input joystick position.

The yaw and the gyroscope values are available, but

we have decided not to use them as input for the NN.

The Agent decision (i.e. the output of decisional

NN) is a discreet joystick position chosen from all

possible user’s joystick outputs (red dots of Figure

2). For our purpose, the NN has to learn to follow

the will of the user whenever it represents a safe

option. Instead, when the Agent predicts an imminent

collision or a harmful situation, the user choice must

be bypassed and the Agent has to act as supervisor

avoiding the dangerous state but trying to follow the

user’s will as much as possible.

Therefore, the Agent has to minimise the error

between the chosen action and the user’s will. This

Simulation Framework to Train Intelligent Agents towards an Assisted Driving Power Wheelchair for People with Disability

191

Figure 2: Continuous user and discretised Agent joysticks.

error is the euclidean distance between the input

joystick and the output joystick, and it represents

the ﬁrst Key Performance Indicator (KPI), measuring

how the agent learns to follow the user’s will. The

maximum acceptable value is 0.1768, which happens

when the user’s joystick lies in the centre of the square

with base 0.25 (the step between two red dots in

Figure 2), according to the following equation:

max distance =



0.25





0.25



= 0.1768

(1)

The ability of the Agent to avoid an obstacle or in

general a harmful state is measured by counting the

number of steps in which the Agent is alive. Each

time the Agent dies, a new episode starts. The Agent

dies both for crashing with obstacles and if it has

reached the maximum number of steps (10, 000). For

this reason, the second KPI is the length of each

episode, which is equal to 10,000 in the best case.

The two KPIs must be evaluated jointly, since the

Agent has to follow the user’s will except in case,

following it, the Agent would end up in a crash.

2.3 Agent Supervisor

The Agent Supervisor module is inside the Manager

and implements the interface used by the Agent. It

Table 1: Penalty ranges.

Sensor

Penalty window

(absolute values)

Joysticks’ distance [0.1768, 2]

Accelerometer (x, y, z)

[[20,40],[20,40],

[20,40]] m/s

Tilt sensor (roll, pitch) [[5,20],[20,30]] °

receives all the events and data coming from the

sensors in the Server Simulator and it hides all the

details of the simulation environment to the Agent,

which receives only the observed state. It also

computes the rewards and the punishments that the

Agent collects for doing a certain action and being in

a certain state.

Considering the task of calculating the score of

the Agent, the Supervisor gives a positive base reward

(+1) when the distance between the output joystick,

chosen by the Agent, and the input joystick is less

than the maximum accepted distance value (Equation

1). The Supervisor punishes the Agent every time

each sensed measurement falls in a range of non-

acceptable values, shown in Table 1. So, all the

absolute values between 0 and the minimum value of

the penalty range are accepted and do not generate

any reward or penalty, except for the joystick, for

which the base reward is received. Otherwise, if the

value of a sensor falls in its penalty range, this value

is normalised in the window [0, 1] (intensity) and

the punishment is given by multiplying the intensity

with the sensor penalty (−2). At last, when the

absolute value of a sensor is higher than the maximum

predeﬁned value, the Agent receives the crash penalty

(−100). The Supervisor gives the same penalty to

the Agent also when a collision happens. The entire

mechanism of rewards and penalties is reported in

Table 2.

For each step of execution, the total score is

computed by summing the reward with any penalties.

2.4 Agent Memory Management

Since each Agent has to store the states during the

episode, a replay memory mechanism is required.

For NNs which exploit the time sequentiality

inside a ﬁxed size batch input, the samples are

stored from the newest to the oldest in a ﬁxed size

Table 2: Supervisor rewards and penalties.

Static rewards and penalties

Type Value Description

Base Reward +1

for a “zero”

joysticks’ distance

Crash Penalty -100

collision, rollover,

high accelerations

Dynamic penalties

Type Value Description

Joysticks’ distance -2

multiplied by the

event intensity

Accelerations -2

Angles -2

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

192

(a) (b)

Figure 3: Memory management: (a) grouped sequential

memory; (b) samples in another dimension.

First In First Out (FIFO) memory, they are grouped

maintaining sequentiality inside groups and, before

each training step, the groups are uniformly shufﬂed

(Figure 3a).

On the contrary, if the networks require the time

sequence as another input dimension, the standard

FIFO memory is navigated grouping the samples in

a new dimension (Figure 3b).

2.5 Clients Synchronisation

Since the Server Simulator is unique in the

entire system and the Clients can be multiple, a

synchronisation mechanism is required between each

Client and the Server and among the Clients.

One Client, the ﬁrst that starts the simulation, acts

as master controlling the simulation, while the others

act as slaves. The simulation runs at the speed of the

Clients, but it is executed on the Server Simulator,

which sends a signal when the simulation step has

ﬁnished and streams sensors data.

Of course, the Clients go at different velocities,

also depending on the NN implemented in the Agent.

For this reason, before allowing the server to run

the simulation step, the master blocks on a barrier,

waiting for all the slaves, in order to avoid that some

slower Clients lose some steps. Once the barrier is

Figure 4: Synchronisation mechanism among clients and

server.

released, the master sends the request of simulation

to the Server (tick request), through the synchronous

RPC. At the same time, the slaves block on a mutex

associated with a condition (tick wait), waiting for the

end of simulation signal, sent by the Server when it

has ﬁnished to simulate the n-th snapshot of the world.

Then, all the Clients are released and block again on

a mutex with a condition (sensors wait), waiting to

receive the sensors data from the Server, avoiding

that too fast Clients miss those data after the end of

simulation signal.

Because of this synchronisation mechanism,

which is shown in Figure 4, all the Clients execute

each step in the same time, at the speed of the slowest,

even if NNs have different inference time.

3 REINFORCEMENT LEARNING

Reinforcement learning (Kaelbling et al., 1996)

is a machine learning paradigm which allows an

autonomous system to create a knowledge of the

environment in a game-like situation, exploiting trial

and error to come up with a solution to the given

problem. The designer sets the reward policy, but the

model does not take hints or suggestions for how to

solve the game.

The environment is usually described as a Markov

decision process in which the probability to take an

action in a particular state is given by the expected

reward (Figure 5).

In this work, we have used Deep Reinforcement

Learning (DRL) (Mnih et al., 2013; Lample and

Chaplot, 2016; Zhang and Du, 2019; Balaji et al.,

2019) to implement multiple Agents able to follow

the user’s will and avoid obstacles. The best

performing Agent will be mounted on-board the

power wheelchair, through a hardware accelerator.

The idea behind DRL is that the NN is trained

using a reward function, that is computed every time

it picks-up an action for the input state, with the aim

of generalising the environment. For each output of

the NN, a reward is given by the Agent Supervisor.

Hence, after thousands or millions of training steps,

Figure 5: Reinforcement Learning architecture.

Simulation Framework to Train Intelligent Agents towards an Assisted Driving Power Wheelchair for People with Disability

193

the NN learns which is the best action depending on

the given state. The aim of the Agent during the

training phase is to maximise the rewards received

while minimising the punishments.

In this problem, part of the input state (the input

joystick) indicates which is the correct action to take

in most of the possible states and so we know what

the Agent should learn. So, we can force the Agent

to learn this behaviour in a supervised way, speeding-

up the training of the NN (Action Driven Learning).

After the Agent has learnt how to choose actions for

the most environment states, we continue with the

ε − greedy RL using a low ε value (Agent Driven

Learning). The ε deﬁnes the probability to choose

a random action in the action space, instead of taking

the action chosen by the agent, combining exploration

and exploitation (Tokic, 2010).

Therefore, exploiting DRL (Mnih et al., 2013) or

Double Deep Q-Learning (Van Hasselt et al., 2015),

which solves the problem of overestimation of the

action values, we have trained multiple autonomous

Agents to move in an environment, respecting the

policy deﬁned through the rewards and punishments.

3.1 Neural Network Architecture

With the framework described in Section 2, it could

be possible to train in parallel different NNs, varying

the type of layers, their size and order.

The Agent with the best performance should

be mounted on an embedded target platform; thus,

we have to ﬁnd a compromise on the number of

parameters, layer typologies and memory footprint to

maintain a low impact on the hardware accelerator.

Moreover, the inference time of the NN has to be as

short as possible. In fact, it introduces a ﬁxed delay

in the joystick control loop. If this delay is longer

than the users’ reaction time, they may feel out of

control of the vehicle. Therefore, we have empirically

decided that the maximum possible delay introduced

by the agent for our target users is 100ms.

In Table 3, the inputs of each NN are reported.

IMU refers to the acceleration on x, y and z axes and

the roll and pitch angles. All input values have been

normalised, to simplify the understanding by the NN.

Table 3: Neural network inputs.

Name Size Type

Depth Image 320x240x1 Float32

Semantic Segmentation

Image

320x240x3 Float32

IMU 5 Float32

User Joystick 2 Float32

Figure 6: Neural network components overview.

In Figure 6, the most promising NN architecture is

reported. It is composed by three main components:

• the CNN block: it is a convolutional block which

receives the image taken as combination of the

semantic segmentation with the depth one;

• the Long Short Memory (LSTM) (Hochreiter and

Schmidhuber, 1997): it maintains the temporal

evolution of the images’ features through its

internal state;

• the Dense layer: it represents the voter of the NN

combining the user’s joystick and the other sensor

measurements with the output of the LSTM.

Some other NNs with different architecture and layers

have been trained and tested using this framework.

For example, a NN with a Convolutional-LSTM

(Shi et al., 2015), replacing the CNN block, has

been implemented and tested. However, it has been

discarded due to the high inference time which arrives

up to 1s.

4 RESULTS

The software framework described above has been

used to train multiple autonomous Agents able to

move in a city following the user’s will and avoiding

obstacles. The two training phases of each Agent are

described in Section 3.

The KPI used to evaluate the Action Driven

Learning phase, where the Agent is forced to follow

the user’s joystick, is the euclidean distance of the

two joysticks. Figure 7a reports the results of the

ﬁrst phase of training for three different NNs. The

maximum acceptable error is reached in a short

number of training steps for all Agents. However,

the two NNs represented with blue and yellow lines

do not generalise well the policy for the entire ﬁrst

phase. Thus, the Agent represented with the red line,

which implements the NN of Figure 6, was selected

for the second phase.

The goal of the Agent Driven Learning phase,

where the Agent controls the vehicle moving in

the simulated world trying to avoid collisions with

objects, is to maximise the length of the episode while

following the user’s joystick as much as possible.

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

194

(a)

(b)

Figure 7: Results of the training: (a) joystick difference -

Action Driven Learning phase; (b) episodes length - Agent

Driven Learning phase.

Figure 7b reports preliminary results of the second

phase, showing the length of the episodes during

the training of the NN selected from the ﬁrst phase

(Figure 6). The ﬁgure underlines how complex,

unstable and slow is the Agent Driven Learning

phase.

The simulation framework ran on a server with the

following speciﬁcations:

• AMD EPYC 7301 with 16-Core Processor;

• 256 GB of RAM;

• NVIDIA Tesla T4 with 16 GB of dedicated RAM.

During the preliminary Agent development, the

simulation framework showed the capability to train

as many networks together as supported by the

platform resources. In our case, we tested up to

6 Clients with the same Server Simulator. All the

Clients run on the same dedicated NVIDIA GPU

Docker, while the CARLA Server Simulator run on

a different GPU enabled dedicated Docker.

The synchronisation mechanism allows to train

networks with different inference times, maintaining

the coherence between two consecutive simulation

steps for all the Agents. Static and dynamic objects

have been introduced inside the simulated world, to

represent a scenario as much similar as possible to a

real one. Thanks to the fault recovery mechanism, the

simulator has been able to train multiple agents for

more than 600 hours.

5 CONCLUSIONS

People with physical disabilities are about 13% of

the U.S. population. Some of them are constrained

to use power wheelchairs, and they usually have

problems for outdoor mobility, on sidewalks or

cycling paths, because of the presence of multiple

obstacles and their slowness in reacting. In order to

overcome dangerous situations, such as crashes or

wheelchair rollover, the assistive technology sector

seeks to develop intelligent power wheelchairs that

can simplify and increase the autonomy of these

people. A simulator where training and testing

multiple neural network architectures is needed

before implementing artiﬁcial intelligence directly

on a real power wheelchair. However, if several

companies and research teams are investing on

frameworks for autonomous driving, few studies have

been done in the ﬁeld of assisted driving for power

wheelchairs.

In this paper, we presented a simulation

framework to train multiple intelligent Agents for

assisted driving. The entire developed framework is

scalable and modular, since it is totally decoupled

from the client architectures. Each client represents a

power wheelchair moving in the simulated world and

implements a neural network based on reinforcement

learning, with the aim of following the user’s will

while avoiding harmful situations, such as collisions

and crashes. All the clients are synchronised with

a mechanism of barriers, mutexes and conditions,

so that coherence in the simulated world is always

maintained. Then, this simulation framework, which

comes from the autonomous driving world, allows us

to reduce the training time, thanks to parallelism, and

to ﬁnd the neural network with the best performance

for assisted driving. In fact, in the simulation, the

vehicle is equipped with customised sensors and is

controlled as it acts as a power wheelchair (instead

of a car). Moreover, multiple static and dynamic

obstacles are injected in the simulated world, making

it as real as possible.

Preliminary results highlight the capability of the

adapted framework to train together many networks

for the assisted driving domain. Once selected the

best implementation in simulation, it is possible

to migrate the Agent in a real embedded platform

mounted on a power wheelchair.

Simulation Framework to Train Intelligent Agents towards an Assisted Driving Power Wheelchair for People with Disability

195

REFERENCES

Balaji, B., Mallya, S., Genc, S., Gupta, S., Dirac,

L., Khare, V., Roy, G., Sun, T., Tao, Y.,

Townsend, B., Calleja, E., and Muralidhara, S. (2019).

Deepracer: Educational autonomous racing platform

for experimentation with sim2real reinforcement

learning.

Caltagirone, L., Bellone, M., Svensson, L., and

Wahde, M. (2017). Lidar-based driving path

generation using fully convolutional neural networks.

IEEE International Conference on Intelligent

Transportation Systems 2017.

Courtney-Long, E., Carroll, D., Zhang, Q., Stevens, A.,

Grifﬁn-Blake, S., Armour, B., and Campbell, V.

(2015). Prevalence of disability and disability type

among adults — united states, 2013. MMWR.

Morbidity and mortality weekly report, 64:777–783.

Dai, J., He, K., and Sun, J. (2016). Instance-aware semantic

segmentation via multi-task network cascades. In The

IEEE Conference on Computer Vision and Pattern

Recognition (CVPR).

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A.,

and Koltun, V. (2017). CARLA: An open urban

driving simulator. In Proceedings of the 1st Annual

Conference on Robot Learning, pages 1–16.

Epic Games (2019). Unreal engine. https://www.unreal

engine.com. [Online]; accessed August 2020.

Faria, B. M., Reis, L. P., and Lau, N. (2014). A survey

on intelligent wheelchair prototypes and simulators.

In Rocha,

A., Correia, A. M., Tan, F. . B., and

Stroetmann, K. . A., editors, New Perspectives

in Information Systems and Technologies, Volume

1, pages 545–557, Cham. Springer International

Publishing.

Giuffrida, G., Meoni, G., and Fanucci, L. (2019). A

yolov2 convolutional neural network-based human–

machine interface for the control of assistive robotic

manipulators. Applied Sciences, 9(11):2243.

GPII DeveloperSpace (2020). What is physical disability?

https://ds.gpii.net/content/ what-physical-disability.

[Online]; accessed August 2020.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

Kaelbling, L. P., Littman, M. L., and Moore, A. W.

(1996). Reinforcement learning: A survey. Journal

of artiﬁcial intelligence research, 4:237–285.

Lample, G. and Chaplot, D. S. (2016). Playing fps

games with deep reinforcement learning. ArXiv,

abs/1609.05521.

Leaman, J. and La, H. M. (2017). A comprehensive

review of smart wheelchairs: Past, present, and

future. IEEE Transactions on Human-Machine

Systems, 47(4):486–499.

Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa,

Y., Silver, D., and Wierstra, D. (2015). Continuous

control with deep reinforcement learning. CoRR.

Merkel, D. (2014). Docker: lightweight linux containers

for consistent development and deployment. Linux

journal, 2014(239):2.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,

Antonoglou, I., Wierstra, D., and Riedmiller, M.

(2013). Playing atari with deep reinforcement

learning.

Nguyen, A. V., Nguyen, L. B., Su, S., and Nguyen, H. T.

(2013a). The advancement of an obstacle avoidance

bayesian neural network for an intelligent wheelchair.

In 2013 35th Annual International Conference of the

IEEE Engineering in Medicine and Biology Society

(EMBC), pages 3642–3645.

Nguyen, J. S., Su, S. W., and Nguyen, H. T. (2013b).

Experimental study on a smart wheelchair system

using a combination of stereoscopic and spherical

vision. In 2013 35th Annual International Conference

of the IEEE Engineering in Medicine and Biology

Society (EMBC), pages 4597–4600.

Pinheiro, O. R., Alves, L. R. G., Romero, M. F. M.,

and de Souza, J. R. (2016). Wheelchair simulator

game for training people with severe disabilities. In

2016 1st International Conference on Technology and

Innovation in Sports, Health and Wellbeing (TISHW),

pages 1–8.

Pithon, T., Weiss, T., Richir, S., and Klinger, E. (2009).

Wheelchair simulators: A review. Technology and

Disability, 21:1–10.

Rasshofer, R. H. and Gresser, K. (2005). Automotive radar

and lidar systems for next generation driver assistance

functions. Advances in Radio Science, 3.

Sch

oner, H.-P. (2018). Simulation in development and

testing of autonomous vehicles. In Bargende, M.,

Reuss, H.-C., and Wiedemann, J., editors, 18.

Internationales Stuttgarter Symposium, pages 1083–

1095, Wiesbaden. Springer Fachmedien Wiesbaden.

Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong,

W. K., and WOO, W.-c. (2015). Convolutional

lstm network: A machine learning approach for

precipitation nowcasting.

Tokic, M. (2010). Adaptive ε-greedy exploration in

reinforcement learning based on value differences.

In Dillmann, R., Beyerer, J., Hanebeck, U. D.,

and Schultz, T., editors, KI 2010: Advances in

Artiﬁcial Intelligence, pages 203–210. Springer Berlin

Heidelberg.

US Department of Health and Human Services (2018).

What are some types of assistive devices and how

are they used? https://www.nichd.nih.gov/health/

topics/rehabtech/conditioninfo/device. [Online];

accessed August 2020.

Van Hasselt, H., Guez, A., and Silver, D. (2015). Deep

reinforcement learning with double q-learning.

Yin, Z. and Shi, J. (2018). Geonet: Unsupervised learning

of dense depth, optical ﬂow and camera pose. In

Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, pages 1983–1992.

Zhang, Q. and Du, T. (2019). Self-driving scale car trained

by deep reinforcement learning.

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

196