Smart Broker Agent Learning How to Reach Appropriate Goal by
Making Appropriate Compromises
Dilyana Budakova
1a
, Veselka Petrova-Dimitrova
1
and Lyudmil Dakovski
2
1
Technical University of Sofia, Plovdiv Branch, Plovdiv, Bulgaria
2
European Polytechnic University, Pernik, Bulgaria
Keywords: Intelligent System, Reinforcement Learning, Intelligent Virtual Agents, Smart Broker Learning Agent.
Abstract: In this paper a new Smart Broker Learning Agent (SBLA) has been proposed, which trains to find the most
acceptable solution to a given problem, according to the individual requirements and emotions of a particular
user. For this purpose, a new structure of the agent has been proposed and reinforcement-learning algorithm
has been used. When the scenarios and criteria under consideration are complex, and when mixed emotions
arise, it may be necessary to compromise on certain criteria in order to achieve the goal. Then knowledge of
the preferences and emotions of the particular user is needed. In these cases, the SBLA does not allow
compromises that are unacceptable to this user. The structure and the way of acting of the agent have been
considered. The knowledge that the SBLA must have and the process of its formation have been described.
The scenarios for solving a specific task and the conducted experiments have been presented. Some
contributions, arising from the use of the proposed agent’s architecture have been discussed, such as: the
opportunity for the agent to explain decisions; to offer the most appropriate solution for each specific user; to
avoid unacceptable compromises, to have empathy, and the greater approval of the offered solutions.
1 INTRODUCTION
In many tasks, the requirements for choosing a goal
and finding a way to achieve it are too complex and
often contradictory. Sometimes they are strictly
individual and personalized and correspond to the
understandings and habits of the particular user,
whose problem is being solved. Negotiating and
modeling empathy, gift giving, smart shopping for
example require an understanding of consumer needs,
understandings and preferences as well (Gehghani et.
Al., 2012, Johnson et al., 2019, Paiva et al., 2017,
Budakova and Dakovski, 2019).
Reinforcement learning algorithms are useful for
solving such problems (Sutton and Barto, 2014.). Yet
it is possible to improve them even more by very
many ways (Gosavi, 2009, Torrado et al., 2018). The
Imitation learning, for example, is a way for their
optimization (Argall, 2009, Amor et al., 2013,
Takahashi, 2017). In (Moffaert, 2016, Moffaert and
Nowé, 2014, Natarajan and Tadepalli, 2005) multiple
a
https://orcid.org/0000-0001-8933-9999
objectives problems with conflict of interests are
considered. In this case multi-objective reinforcement
learning algorithms can provide one or more Pareto
optimal balances of the original objectives. The
single-policy techniques can be employed to guide
the search toward a particular compromise solution,
when the decision makers preferences are known a
priori. It might be appropriate to provide a set of
Pareto optimal compromise solutions to the decision
maker, each compromising a different balance of
objectives (Moffaert, 2016,
Cho et al., 2017
) when the
preference is unclear before the optimization process
starts. The advanced idea is the simultaneous learning
of a set of compromise solutions. Multiple objectives
modeling and performance optimizations are
described in (Cho et al., 2017).
When a goal cannot be achieved according to the
set requirements, compromises have to be made
(Gunantara, 2018, Vachhani et al., 2015). One
solution is for the agent to reach the goal by making
as few compromises as possible with the required
Budakova, D., Petrova-Dimitrova, V. and Dakovski, L.
Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises.
DOI: 10.5220/0010186901810188
In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021) - Volume 1, pages 181-188
ISBN: 978-989-758-484-8
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
181
criteria (Budakova et al., 2019). This solution may
recommend compromises that are unacceptable to a
user. Users are reluctant to take actions that are
unacceptable to them and reject the proposed by the
system way to reach the goal.
The SBLA, proposed in this paper, chooses ways
to reach the goal by making only acceptable
compromises. To achieve this, knowledge of the
individual understandings and emotional attitudes of
each individual user about the possible ways to reach
the goal is needed. Knowledge of public attitudes and
understandings of these possibilities is also needed.
The SBLA can then choose whether or not an action
is acceptable to a user.
For this purpose, a new
structure of the agent has been proposed and
reinforcement-learning algorithm has been used.
The rest of the paper is structured as it follows: the
SBLA structure is explained in section 2; the
experimental setting is given in section 3; the
conducted experiments are presented in sections 4
and 5; and in the 6-th section a number of conclusions
are drawn.
Figure 1: SBLA structure.
2 SMART BROKER LEARNING
AGENT STRUCTURE
A new SBLA has been proposed, which trains to find
the most acceptable solution to a given problem,
according to the individual requirements and
emotions of a particular user.
To this end, the agent is trained to offer the most
appropriate goal and the best way to achieve it. For
this purpose, a new structure of the agent has been
proposed, (Figure 1) which includes a memory block
(criteria-based model, model of rewards, model of the
environment), block of knowledge (of the possible
solutions, the individual requirements and emotions
of a user, as well as of the possible scenarios),
appropriate actions/states marking block, training
block, containing a Reinforcement learning
algorithm, explanation block, solution visualization
block.
When the scenarios and criteria under
consideration are complex, and when mixed emotions
arise, it may be necessary to compromise on certain
criteria in order to achieve the goal. Then knowledge
of the preferences and emotions of the particular user
is needed.
In order to make the reinforcement agent find the
appropriate path to the suitable goal by meeting
complex criteria, a critera-based model, model
represented as an additional agent memory matrix is
introduced. This model shows how the user perceives
and evaluates the potential goals and the options for
their achievement. The criteria-based model is similar
to the reward model of the Q-learning algorithm. For
the sake of convenience it will be further called the
Broker Matrix. The criteria-based model maintains a
specific value for each existing edge in the graph. It
is a measure value for each edge and node, i.e., an
estimate of the choice to move from one state to
another using a given edge. When working on an
algorithm, the transition from one state to another is
sought by selecting edges and states only with a
specific estimate. If such edges or states are missing,
only those with acceptable measure values are
selected.
On the one hand, the Pareto front can provide a set
of optimal compromise solutions. On the other hand,
the proposed SBLA and reinforcement learning
algorithm can provide a way of achieving the goal by
means of the most acceptable compromises.
3 EXPERIMENTAL SETTING
In the considered example the goal is the purchase of
a small property of 20-30 square meters built area in
a big industrial city, where the user is about to start
working. The property can be a residential one or an
office with a possibility to be used as a hotel room, or
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
182
a place where one can spend a night occasionally. The
user prefers new construction. However, the
possibility of buying a well-preserved old property
with a larger area at an equivalent price is also under
consideration. The focus is on small property types
because the user does not have funds. He/she has to
save, but saving takes years of patience. He/she can
sell a property he/she already possesses, but a sold
property is a loss for him/her. He/she can take a loan,
but taking loans is risky. Therefore, according to the
Pareto front, a small property is the compromised
balanced option, suitable for this particular person.
Figure 2 presents a graph, which shows the
possible states (the nodes in the graph) for solving the
problem of buying the most appropriate residential
property in the most suitable for a particular user way.
The existing sequences of these states are presented
by means of the oriented edges in the given graph.
Figure 2: Oriented graph, which presents the states in
solving the problem of buying a property and their
sequence. The colours show how the user perceives them
emotionally.
Table 1 gives a description of these states and the
trade-offs required in the process of their selection.
The colors show the emotions provoked by the given
states and by undertaking actions for their
achievement on the side of the user. The
correspondence between the colors and the emotions
they reveal is given in Tables 2 and 3.
The SBLA will suggest ways to reach each of the
three most appropriate targets from the Pareto front.
They are marked by the following nodes: node 21 –
an old but preserved property with a living area of 35
square meters; node 22 - a small property suitable for
an office and a hotel room with a built-up area of 20
sq. m. and node 23 - a small property in a new
building with a built-up area of 30 sq. m.
The initial state is indicated by node 0 and yellow
color. It starts the process of considering the problem.
The user moves to a large industrial city to take a job
position there and has no property to live in. As this
is a dream job for him/her, this node is marked as a
state in which the emotion is joy and enthusiasm.
Consequently, it is unacceptable for him/her to give
up the offered job. From here, things get trickier. It is
possible that the user has another (and only)
residential property - node 1; he/she may have no
property at all - node 2; and it is also possible that
he/she possesses other properties in other places -
node 3. As it can be seen from Figure 1, the state
graph for solving this problem (albeit simplified)
allows many different modes of action. Actions and
situations, evoking joy and enthusiasm in the user, are
marked in yellow; the non-risky ones are marked in
green; the extremely unacceptable are red; the more
acceptable ones are orange; and the blue color marks
the actions and situations which are not very
comfortable, not risk-free and not very desirable, but
still hopeful.
Thus, even at first glance, Figure 2 shows that
from the initial node 0 to any of the three targets,
defined as acceptable and represented by nodes 21, 22
and 23, there is no path in the graph that includes only
states and actions, evoking enjoyment and excitement
in the user; nor is there a path that includes only risk-
free states and actions, and so on. In other words,
whichever path is chosen, compromises and choices
will have to be made.
For example, the user will have to decide whether
to sell the properties he/she already possesses and buy
the desired property or not to sell but instead repair
and improve them. In the second case he/she will
have to rent a room/house for several years and at the
same time to save money until he collects part or all
of the required sum. He/she has to decide whether to
take a mortgage loan or not and for what part of the
property price. All these decisions will change the
buyer’s life both in the short and in the long run. They
all have their advantages and disadvantages. The
purpose of a SBLA is to understand the user's way of
thinking and to offer solutions regarding the ways to
realize the most appropriate option.
What sequence of actions should the user follow
in order to feel happiest on the way to achieving the
goal?
What sequence of actions should he/she follow in
order to feel most secure on his way to the goal of
having his home in the big industrial city in which
he/she works?
Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises
183
Table 1: Description of the states, presented in the graph in
Figure 2 and the compromises they require.
N Description; Compromises/Advantages.
0
The user works in a big city. To move somewhere else
is an unacceptable compromise for him/he
r
.
1
The user owns a residential property. This makes
him/her feel secure.
2
The user does not own a residential property. This is
risky. Not having a property is an unacceptable
compromise for him/her.
3
The user owns more than one residential property.
This gives him/her great safety.
4
The user rents a property and saves to buy a
residential property of his/her own; It takes years of
p
atience, but it’s ris
k
-free.
5
The user commutes to his/her workplace every day or
on a schedule and saves on the purchase of a
residential property. It takes years of patience, but it's
not risky.
6
The user renovates and improves the properties
possessed by him/her. The period for raising funds for
the purchase of a new property is extended. Safety is
p
rovided. Acceptable compromise.
7
The user gets a mortgage credit amounting at 60% of
the price of the new property, but he/she has no other
savings. A consumer credit is required for the
remaining amount of money needed. There is a great
risk for all his/her property. Living with two loans
would mean great restrictions. A difficult to accept
compromise.
8
The user sells his/her only property. Loss of property.
Risk of running out of property. Unacceptable
compromise.
9
Sells one of his/her properties. Loss of property.
Difficult compromise to accept.
10
20% of the price collected. Enough to get a mortgage.
Acceptable compromise. Brings safety.
11
40% of the property price available -
enough to get a mortgage. Acceptable compromise.
Safety.
12
He/she sells his/her only property, but only after
he/she has collected 20% of the necessary funds. Loss
of property. Unacceptable compromise.
13
Sells one of his residential properties, but after he
has got 40% of the necessary funds ready. Loss of
p
roperty. Acceptable compromise.
14
50% of the necessary funds available after the sale.
Loss of property and risk of funds shortage.
Acceptable compromise.
15
Takes a mortgage credit 80% of the value of the new
property and has the remaining funds available.
Acceptable risk. Acceptable compromise.
16
He/she takes a mortgage on 60% of the new property
and has the remaining funds. Acceptable risk.
Acceptable compromise.
17
Takes a consumer credit to cover the mortgage up to
100%. Risk for all property. Must live in limitations
and deprivation. Difficult compromise to accept.
18
Takes a 30% mortgage credit and has the remaining
funds available. This is risk-free and no compromise
is required.
19
Takes a 10% mortgage credit and has the remaining
funds available. This is a great level of security. No
compromises required.
20
Takes a mortgage credit 50% of the price of the
property and has the remaining funds available. There
is some risk. Acceptable compromise.
21
Buys an old but larger residential property. Though
the
p
roperty is old, the compromise is acceptable.
22
Buys a very small office in order to use it both as a
hotel room and as an office. It is not a residential
property and the expenses for taxes, electricity and
wate
r
are highe
. Safety. Acceptable compromise.
23
Buys a new very small residential property in the city
where he works. No compromise needed. This is the
dream home.
Table 2: Meaning of the colors of the nodes and edges in
the graph, given in Figure 2.
Colour Descri
p
tion
Green The state leads to securit
y
Dark red
The state requires a highly
unacceptable compromise.
Red
The state requires an unacceptable
compromise
Orange
The state requires a poorly
acce
p
table com
p
romise.
Yellow
Achieving this state is highly
desirable.
Blue
It means an acceptable state in which
there is no risk, but a poorly
acceptable compromise is required
to be made.
Table 3: Description of the emotion represented by the
colors of the nodes and edges in the graph, given in Figure
2.
Colo
r
Emotion
Green Securit
y
.
Re
d
Panic, anxiet
y
, dissatisfaction.
Yellow Joy and enthusiasm.
Blue Calm and hope.
Is there a sequence of actions making the user feel
excited and happy all the way to the goal? It turns out
that such a sequence of actions on the way to the goal
does not exist and compromises are required. So what
are the most acceptable compromises? Are there
actions that guarantee greater security, but not so
much elan and enthusiasm in the user and what are
the most acceptable compromises? It is precisely this
type of actions, which can be considered to be the
most acceptable compromises. Also, are there actions
that require more time, are less safe, cause some
inconvenience, and are still acceptable? The aim is to
avoid the unacceptable actions. It can be seen from
Figure 2 that the sale of the properties he/she owns is
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
184
an unacceptable action for the person under
consideration.
4 FIRST EXPERIMENT
A buyer, who does not own any property is
considered. After starting a secure job in a large
industrial city, he/she wants to buy a place to live. The
system offers the fastest way to achieve this goal,
namely, to take a mortgage loan from the bank up to
80% and to cover the remaining 20% with a consumer
loan (Figure 3).
A dotted black line shows the sequence of states
until a solution is reached. The system offers this
solution if the modification of the reinforcement
learning algorithm is not used. In this case, there will
be a great risk over the years until the consumer loan
is repaid. After that moment only the mortgage will
remain. The amount of loan installments will be
drastically reduced and the user could feel calmer and
lead a normal life.
When the criteria-based model, presented by the
Broker Matrix is used, it is established that the
consumer considers taking such loans to be highly
risky (orange states 7 and 17). Taking a mortgage
loan of up to 80% of the sum is relatively promising
for him/her. However, it is not acceptable to take a
consumer loan in parallel in order to fully cover the
price of the property. This action makes the user
anxious.
Figure 3: A dotted black line shows a sequence of states for
buying a property by a person, who does not have any
residential properties. The system offers this sequence only
if the newly introduced criteria-based model, presented by
the Broker Matrix, is not used.
A condition is set therefore - to offer the buyer
only actions, acceptable to him/her, i.e., actions,
perceived by him/her as reliable and secure and/or
which he/she would undertake with joy and readiness
or with mixed feelings between joy and hope, but
without panic and stress. This leads to the option
shown in Figure 4. According to it, the buyer could
live for several years in a rented apartment and save
money until the accumulation of at least 20% of the
price of the property he/she wishes to buy, having in
mind that he/she will then be able to buy the property
only against a mortgage loan. This option turns out to
be acceptable for the buyer. Depending on the years
he/she could spend on fundraising and the size of
his/her salary, he/she will have to decide which of the
proposed housing options to buy. Only appropriate
compromises were made and a suitable property was
chosen.
Figure 4: A dotted black line shows a sequence of states for
buying a property by a person, who does not have any
residential properties. It is proposed by the system when
using the newly introduced criteria-based model, presented
by the Broker Matrix.
5 SECOND EXPERIMENT
This time a buyer, who starts his dream job in a large
industrial city but owns properties in another smaller
provincial town is considered. He/she wants to buy a
property near his/her workplace. A survey, conducted
with the potential buyer reveals that he/she prefers
security and does not like to take risks. He/she is
reluctant to sell real estate and this thought strains and
repulses him. On the contrary, he/she loves to travel
and would like to regularly invest small sums to
maintain and improve the properties he/she owns.
He/she loves travelling and although it takes time,
he/she would gladly travel for several years. He wants
to speed up the deal as much as possible and therefore
prefers to buy a home as soon as he collects 20% of
the sum. It is known that the bank could give him a
mortgage up to 80% of the value of the property.
Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises
185
The system is looking for a way in which the user
can buy the most suitable home in the best possible
way. A sequence of actions should be proposed that
will allow him/her to feel as happy and confident as
possible.
5.1 Result of a Survey Conducted with
the Buyer, Aiming at Clarifying
His/Her Way of Thinking
A survey with a specific buyer on his/her opinion and
the emotions he/she feels about the different ways of
buying a property are presented in Figure 2. Here are
the more important considerations of the buyer.
A small newly built office - smaller than the area
of the homes under construction - is a compromise
option, as it is a cheaper property, smaller, but
sufficient for both residential needs and business
solution. The minimalist lifestyle is acceptable to
him/her. The required amount of money will be
collected in a shorter time. The risk is lower. This is
the most secure solution and is therefore marked in
green.
A residential property that is not newly-built and
is in need of renovation allows a few more squares for
the same price as the new but smaller home. This is
an unacceptable compromise for the user in question.
However, it is marked in orange because it can still
be considered as an affordable compromise.
On the one hand, raising more funds requires
more time. On the other hand, in case of availability
of a larger percentage from the price of the purchased
property, the user will feel more secure. That is why
the saved 20% of the price of the property are marked
in blue color, as not very secure but time-saving. An
available saved sum of up to 40% of the price of the
property is marked in green as a secure enough state.
The available 50% of the price of the property
coming from the sale of another property is also
marked in green as an amount that provides security.
Taking a mortgage loan of 50% - 60% is
considered a risk-free step to the goal. Mortgage loan
in the range of 10% - 30%, if the remaining funds are
available, gives not only security but also joy and
enthusiasm to the buyer. That is why it is marked in
yellow.
The maintenance and improvement of the
properties the buyer owns brings joy, satisfaction and
security to him/her, on the one hand. However, these
actions require investment and allocation of funds.
This, in turn, extends the period for collecting savings
to buy the dream home. Leaving the care of the owned
property causes panic, indignation and anger in the
buyer and would be unacceptable. The maintenance
and improvement of the properties he/she owns are
marked in yellow - the color of joy and satisfaction.
Commuting to work and back is tiring and a waste
of time, but it is acceptable for the user and gives
him/her security and comfort. Therefore, it is
considered a preferred action and is marked in green
- the color of security.
Figure 5: A sequence of the most secure possible conditions
for buying a property by a person, who has properties in a
location other than where he wants to buy a property. The
innovative criteria-based model presented by the Broker
Matrix is used.
Renting is acceptable for the buyer if the rent is
not high. This means that the conditions of living will
be only limited to the most basic ones and that his/her
life will be minimalistic for years, but full of hope.
Therefore, this action is marked in blue - the color of
hope.
5.2 Solutions, Proposed by the System
Figure 5 shows the sequence of states for buying a
residential property by a person, who already has
another property in a settlement other than the place
where he/she wants to buy one. This sequence is
offered by the system when using the newly
introduced criteria-based model presented by the
Broker Matrix. The goal is to choose a course of
action that is as secure as possible for the buyer. It can
be seen that the proposed path covers nodes 5, 6, 11,
16 and 22, which are secure. Nodes 0 and 3 are
yellow, i.e., they make the person happy. It means
that the buyer cannot give up his job and cannot sell
the properties he possesses. The system does not
suggest these unacceptable options to him/her.
Figure 6 suggests another option that would be
acceptable to this user. This is the sequence of actions
that would make our user as happy as possible at
every step, but in which the risk is greater. It can be
seen that in this case not all actions are in yellow, i.e.,
the user will have to make compromises though
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
186
acceptable ones. They are colored blue and are related
to security. The user, on the one hand, saves time and
speeds up the purchase of the property by collecting
only 20% of its price. On the other hand, he focuses
on buying a more expensive new residential property,
instead of a small and cheaper office. However, the
risk in this case is higher. He/she takes a bigger
mortgage and will pay it off for a longer time.
Figure 6: A sequence of states that will make the user as
happy and enthusiastic as possible when buying a property
though he/she will have to compromise on security. The
user owns properties in a location other than the one he/she
wants to buy the new property in. The innovative criteria-
based model presented by the Broker Matrix is used.
6 CONCLUSIONS
This paper proposes a new SBLA that includes a
memory block (criteria-based model, model of
rewards, model of the environment), block of
knowledge (of the possible solutions, the individual
requirements and emotions of a user, as well as of the
possible scenarios), appropriate actions/states
marking block, training block, containing a
Reinforcement learning algorithm, explanation block,
solution visualization block. The aim is to empower
the learning agent to propose an appropriate way of
reaching a suitable goal. The use of the criteria-based
model represented as an additional agent memory
matrix is important. This model shows how the user
perceives and evaluates the potential goals and the
possibilities for their realization. This means that
knowledge of the user's habits and understandings is
required.
The agents can make a compromise by not
following a given criterion. The criteria are arranged
by their level of emotional acceptability for the user.
That is way the agent can choose the most acceptable
compromises. The learning agent can solve problems
by not allowing unacceptable compromises to be
made. On the one hand, the Pareto front can provide
a set of optimal compromise solutions. On the other
hand, the proposed SBLA and reinforcement learning
algorithm can provide a way of achieving the goal by
means of the most acceptable compromises.
The introduced criteria-based model, represented
by the Broker Matrix is not a probabilistic one. It
reflects the user's opinion on the considered problem.
This is useful when solving problems, not common in
a user's life and when there is no statistics on user
actions. An example of such a problem is the
purchase of a residential property. And it is possible
that the user buys a property for the first and last time
in his life.
Also, the development and use of criteria-based
models allows to avoid the use of penalties in the
work of the reinforcement learning algorithm.
Instead, the choice of actions can be explained. If
emotional, motivational and other models are built,
then the learning agent will be able to give
explanations for each action from a different point of
view.
The proposed SBLA is also suitable for
negotiating and modeling empathy. These activities
require an understanding of consumer needs,
understandings and preferences as well (Gehghani et
al., 2012; Johnson et al., 2019; Paiva et al., 2017;
Maslow, 1998).
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the financial
support provided within the Technical University of
Sofia, Research and Development Sector, Project for
PhD student helping N202PD0007-19 “Intelligent
Cognitive Agent behaviour modelling and
researching”.
REFERENCES
Gehghani, M., Gratch, J., Carnevale, P. J., 2012.
Interpersonal Effects of Emotions in Morally-charged
Negotiations. Proceedings of the Annual Meeting of the
Cognitive Science Society, Volume 34, 1476-1481.
Johnson, E., Roediger, S., Lucas, G., Gratch, J., 2019.
Assessing Common Errors Students Make When
Negotiating. 19th ACM International Conference on
Intelligent Virtual Agents (IVA’19). ACM, Paris,
France, 30-37, DOI: http://doi.org/10.1145/
3308532.3329470.
Paiva, A., Leite I., Boukricha, H., Wachsmuth, I., 2017.
Empathy in Virtual Agents and Robots: A Survey. ACM
Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises
187
Trans. Interact. Intell. Syst. 7, 3, Article 11 (September
2017), 40 pages. https://doi.org/10.1145/2912150.
Budakova, D., Dakovski, L., 2019. Smart shopping system.
8th International scientific conference (TechSys’19).
Plovdiv, Bulgaria, 16-18 May 2019.
doi:10.1088/issn.1757- 899X; ISSN: 1757-899X;
ISSN: 1757-8981.
Sutton, R. S., Barto, A. G., 2014. Reinforcement Learning:
An Introduction. MIT Press, Cambridge, London,
England, [Online]. Available:
http://incompleteideas.net/book/ebook/the-book.html.
Gosavi, A., 2009. Reinforcement Learning: A Tutorial
Survey and Recent Advances. INFORMS Journal on
Computing. Vol. 21 No.2, pp. 178-192, 2009.
Torrado, R. R., Bontrager, Ph., Togelius, Liu, J. J. and
Perez-Liebana, D., 2018. Deep Reinforcement
Learning for General Video Game AI. IEEE
Conference on Computatonal Intelligence and Games.
CIG, 10.1109/CIG.2018.8490422.
Argall, B. D., 2009. Learning Mobile Robot Motion Control
from Demonstration and Corrective Feedback. Thesis.
Robotics Institute Carnegie Mellon University
Pittsburgh, PA 15213, 172.
Amor, H. B., Vogt D., Ewerton M., Berger, E., Jung, B.,
Peters, J., 2013. Learning Responsive Robot Behavior
by Imitation. IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2013). IEEE,
Japan, 3257-3264.
Takahashi, K., Kim, K., Ogata, T., Sugano, S., 2017. Tool-
body assimilation model considering grasping motion
through deep learning. Robotics and Autonomous
Systems. Elsevier, Volume 91, 115–127.
Moffaert, K. V., 2016. Multi-Criteria Reinforcement
Learning for Sequential Decision Making Problems,
Dissertation for the degree of Doctor of Science:
Computer Science, Brussels University Press, ISBN
978 90 5718 094 1.
Moffaert, K. V., Nowé, A., 2014. Multi-objective
reinforcement learning using sets of pareto dominating
policies. Journal of Machine Learning Research,
15:3483–3512.
Natarajan, S., Tadepalli, P., 2005. Dinamic Preferences in
Multi-Criteria Reinforcement Learning. 22nd
International Conference on Machine Learning. Bonn,
Germany.
Gunantara, N., 2018. A review of multi-objective
optimization: Methods and its applications. Cogent
Engineering, 5(1), 1502242.
https://doi.org/10.1080/23311916.2018.1502242
Cho, J., Wang, Y., Chen, I., Chan, K. S., Swami A., 2017,
"A Survey on Modeling and Optimizing Multi-
Objective Systems," in IEEE Communications Surveys
& Tutorials, vol. 19, no. 3, pp. 1867-1901, third quarter
2017, doi: 10.1109/COMST.2017.2698366.
Vachhani, V. L., Dabhi V. K., Prajapati, H. B., 2015.
"Survey of multi objective evolutionary algorithms,"
International Conference on Circuits, Power and
Computing Technologies [ICCPCT-2015], Nagercoil,
2015, pp. 1-9, doi: 10.1109/ICCPCT.2015.7159422.
Budakova, D., Dakovski L., Petrova-Dimitrova, V., 2019.
Smart Shopping Cart Learning Agents Development.
19th IFAC-PapersOnLine, Conference on
International Stability, Technology and Culture,
(TECIS 2019). Volume 52, Issue 25, 26-28 September,
64-69, Sozopol, Bulgaria, Elsevier ISSN 2405-
8963,https://doi.org/10.1016/j.ifacol.2019.12.447
Budakova, D., Dakovski, L., Petrova-Dimitrova, V., 2019.
Smart Shopping Cart Learning Agents. International
journal on Advances in internet technology, IARIA,
issn: 1942-2652, Vol. 12, nr 3&4. 109 – 121.
Maslow, A. H., 1998. Motivation and Personality,
Addison-Wesley Education Publishers, 2nd Edition,
Paperback, 400 pages, ISBN: 0060442417 (ISBN13:
9780060442415).
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
188