Authors:
Gabriel Leuenberger
and
Marco A. Wiering
Affiliation:
University of Groningen, Netherlands
Keyword(s):
Reinforcement Learning, Continuous Actions, Multi-Layer Perceptrons, Computer Games, Actor-Critic Methods.
Related
Ontology
Subjects/Areas/Topics:
Agents
;
Artificial Intelligence
;
Autonomous Systems
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Computational Intelligence
;
Evolutionary Computing
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Methodologies and Methods
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
;
Theory and Methods
Abstract:
Reinforcement learning agents with artificial neural networks have previously been shown to acquire human
level dexterity in discrete video game environments where only the current state of the game and a reward are
given at each time step. A harder problem than discrete environments is posed by continuous environments
where the states, observations, and actions are continuous, which is what this paper focuses on. The algorithm
called the Continuous Actor-Critic Learning Automaton (CACLA) is applied to a 2D aerial combat simulation
environment, which consists of continuous state and action spaces. The Actor and the Critic both employ
multilayer perceptrons. For our game environment it is shown: 1) The exploration of CACLA’s action space
strongly improves when Gaussian noise is replaced by an Ornstein-Uhlenbeck process. 2) A novel Monte
Carlo variant of CACLA is introduced which turns out to be inferior to the original CACLA. 3) From the latter
new insights are obtained that
lead to a novel algorithm that is a modified version of CACLA. It relies on a
third multilayer perceptron to estimate the absolute error of the critic which is used to correct the learning rule
of the Actor. The Corrected CACLA is able to outperform the original CACLA algorithm.
(More)