Smart Broker Agent Learning How to Reach Appropriate Goal by

Making Appropriate Compromises

Dilyana Budakova

, Veselka Petrova-Dimitrova

and Lyudmil Dakovski

Technical University of Sofia, Plovdiv Branch, Plovdiv, Bulgaria

European Polytechnic University, Pernik, Bulgaria

Keywords: Intelligent System, Reinforcement Learning, Intelligent Virtual Agents, Smart Broker Learning Agent.

Abstract: In this paper a new Smart Broker Learning Agent (SBLA) has been proposed, which trains to find the most

acceptable solution to a given problem, according to the individual requirements and emotions of a particular

user. For this purpose, a new structure of the agent has been proposed and reinforcement-learning algorithm

has been used. When the scenarios and criteria under consideration are complex, and when mixed emotions

arise, it may be necessary to compromise on certain criteria in order to achieve the goal. Then knowledge of

the preferences and emotions of the particular user is needed. In these cases, the SBLA does not allow

compromises that are unacceptable to this user. The structure and the way of acting of the agent have been

considered. The knowledge that the SBLA must have and the process of its formation have been described.

The scenarios for solving a specific task and the conducted experiments have been presented. Some

contributions, arising from the use of the proposed agent’s architecture have been discussed, such as: the

opportunity for the agent to explain decisions; to offer the most appropriate solution for each specific user; to

avoid unacceptable compromises, to have empathy, and the greater approval of the offered solutions.

1 INTRODUCTION

In many tasks, the requirements for choosing a goal

and finding a way to achieve it are too complex and

often contradictory. Sometimes they are strictly

individual and personalized and correspond to the

understandings and habits of the particular user,

whose problem is being solved. Negotiating and

modeling empathy, gift giving, smart shopping for

example require an understanding of consumer needs,

understandings and preferences as well (Gehghani et.

Al., 2012, Johnson et al., 2019, Paiva et al., 2017,

Budakova and Dakovski, 2019).

Reinforcement learning algorithms are useful for

solving such problems (Sutton and Barto, 2014.). Yet

it is possible to improve them even more by very

many ways (Gosavi, 2009, Torrado et al., 2018). The

Imitation learning, for example, is a way for their

optimization (Argall, 2009, Amor et al., 2013,

Takahashi, 2017). In (Moffaert, 2016, Moffaert and

Nowé, 2014, Natarajan and Tadepalli, 2005) multiple

https://orcid.org/0000-0001-8933-9999

objectives problems with conflict of interests are

considered. In this case multi-objective reinforcement

learning algorithms can provide one or more Pareto

optimal balances of the original objectives. The

single-policy techniques can be employed to guide

the search toward a particular compromise solution,

when the decision maker’s preferences are known a

priori. It might be appropriate to provide a set of

Pareto optimal compromise solutions to the decision

maker, each compromising a different balance of

objectives (Moffaert, 2016,

Cho et al., 2017

) when the

preference is unclear before the optimization process

starts. The advanced idea is the simultaneous learning

of a set of compromise solutions. Multiple objectives

modeling and performance optimizations are

described in (Cho et al., 2017).

When a goal cannot be achieved according to the

set requirements, compromises have to be made

(Gunantara, 2018, Vachhani et al., 2015). One

solution is for the agent to reach the goal by making

as few compromises as possible with the required

Budakova, D., Petrova-Dimitrova, V. and Dakovski, L.

Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises.

DOI: 10.5220/0010186901810188

In Proceedings of the 13th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2021) - Volume 1, pages 181-188

ISBN: 978-989-758-484-8

181

criteria (Budakova et al., 2019). This solution may

recommend compromises that are unacceptable to a

user. Users are reluctant to take actions that are

unacceptable to them and reject the proposed by the

system way to reach the goal.

The SBLA, proposed in this paper, chooses ways

to reach the goal by making only acceptable

compromises. To achieve this, knowledge of the

individual understandings and emotional attitudes of

each individual user about the possible ways to reach

the goal is needed. Knowledge of public attitudes and

understandings of these possibilities is also needed.

The SBLA can then choose whether or not an action

is acceptable to a user.

For this purpose, a new

structure of the agent has been proposed and

reinforcement-learning algorithm has been used.

The rest of the paper is structured as it follows: the

SBLA structure is explained in section 2; the

experimental setting is given in section 3; the

conducted experiments are presented in sections 4

and 5; and in the 6-th section a number of conclusions

are drawn.

Figure 1: SBLA structure.

2 SMART BROKER LEARNING

AGENT STRUCTURE

A new SBLA has been proposed, which trains to find

the most acceptable solution to a given problem,

according to the individual requirements and

emotions of a particular user.

To this end, the agent is trained to offer the most

appropriate goal and the best way to achieve it. For

this purpose, a new structure of the agent has been

proposed, (Figure 1) which includes a memory block

(criteria-based model, model of rewards, model of the

environment), block of knowledge (of the possible

solutions, the individual requirements and emotions

of a user, as well as of the possible scenarios),

appropriate actions/states marking block, training

block, containing a Reinforcement learning

algorithm, explanation block, solution visualization

block.

When the scenarios and criteria under

consideration are complex, and when mixed emotions

arise, it may be necessary to compromise on certain

criteria in order to achieve the goal. Then knowledge

of the preferences and emotions of the particular user

is needed.

In order to make the reinforcement agent find the

appropriate path to the suitable goal by meeting

complex criteria, a critera-based model, model

represented as an additional agent memory matrix is

introduced. This model shows how the user perceives

and evaluates the potential goals and the options for

their achievement. The criteria-based model is similar

to the reward model of the Q-learning algorithm. For

the sake of convenience it will be further called the

Broker Matrix. The criteria-based model maintains a

specific value for each existing edge in the graph. It

is a measure value for each edge and node, i.e., an

estimate of the choice to move from one state to

another using a given edge. When working on an

algorithm, the transition from one state to another is

sought by selecting edges and states only with a

specific estimate. If such edges or states are missing,

only those with acceptable measure values are

selected.

On the one hand, the Pareto front can provide a set

of optimal compromise solutions. On the other hand,

the proposed SBLA and reinforcement learning

algorithm can provide a way of achieving the goal by

means of the most acceptable compromises.

3 EXPERIMENTAL SETTING

In the considered example the goal is the purchase of

a small property of 20-30 square meters built area in

a big industrial city, where the user is about to start

working. The property can be a residential one or an

office with a possibility to be used as a hotel room, or

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

182

a place where one can spend a night occasionally. The

user prefers new construction. However, the

possibility of buying a well-preserved old property

with a larger area at an equivalent price is also under

consideration. The focus is on small property types

because the user does not have funds. He/she has to

save, but saving takes years of patience. He/she can

sell a property he/she already possesses, but a sold

property is a loss for him/her. He/she can take a loan,

but taking loans is risky. Therefore, according to the

Pareto front, a small property is the compromised

balanced option, suitable for this particular person.

Figure 2 presents a graph, which shows the

possible states (the nodes in the graph) for solving the

problem of buying the most appropriate residential

property in the most suitable for a particular user way.

The existing sequences of these states are presented

by means of the oriented edges in the given graph.

Figure 2: Oriented graph, which presents the states in

solving the problem of buying a property and their

sequence. The colours show how the user perceives them

emotionally.

Table 1 gives a description of these states and the

trade-offs required in the process of their selection.

The colors show the emotions provoked by the given

states and by undertaking actions for their

achievement on the side of the user. The

correspondence between the colors and the emotions

they reveal is given in Tables 2 and 3.

The SBLA will suggest ways to reach each of the

three most appropriate targets from the Pareto front.

They are marked by the following nodes: node 21 –

an old but preserved property with a living area of 35

square meters; node 22 - a small property suitable for

an office and a hotel room with a built-up area of 20

sq. m. and node 23 - a small property in a new

building with a built-up area of 30 sq. m.

The initial state is indicated by node 0 and yellow

color. It starts the process of considering the problem.

The user moves to a large industrial city to take a job

position there and has no property to live in. As this

is a dream job for him/her, this node is marked as a

state in which the emotion is joy and enthusiasm.

Consequently, it is unacceptable for him/her to give

up the offered job. From here, things get trickier. It is

possible that the user has another (and only)

residential property - node 1; he/she may have no

property at all - node 2; and it is also possible that

he/she possesses other properties in other places -

node 3. As it can be seen from Figure 1, the state

graph for solving this problem (albeit simplified)

allows many different modes of action. Actions and

situations, evoking joy and enthusiasm in the user, are

marked in yellow; the non-risky ones are marked in

green; the extremely unacceptable are red; the more

acceptable ones are orange; and the blue color marks

the actions and situations which are not very

comfortable, not risk-free and not very desirable, but

still hopeful.

Thus, even at first glance, Figure 2 shows that

from the initial node 0 to any of the three targets,

defined as acceptable and represented by nodes 21, 22

and 23, there is no path in the graph that includes only

states and actions, evoking enjoyment and excitement

in the user; nor is there a path that includes only risk-

free states and actions, and so on. In other words,

whichever path is chosen, compromises and choices

will have to be made.

For example, the user will have to decide whether

to sell the properties he/she already possesses and buy

the desired property or not to sell but instead repair

and improve them. In the second case he/she will

have to rent a room/house for several years and at the

same time to save money until he collects part or all

of the required sum. He/she has to decide whether to

take a mortgage loan or not and for what part of the

property price. All these decisions will change the

buyer’s life both in the short and in the long run. They

all have their advantages and disadvantages. The

purpose of a SBLA is to understand the user's way of

thinking and to offer solutions regarding the ways to

realize the most appropriate option.

What sequence of actions should the user follow

in order to feel happiest on the way to achieving the

goal?

What sequence of actions should he/she follow in

order to feel most secure on his way to the goal of

having his home in the big industrial city in which

he/she works?

Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises

183

Table 1: Description of the states, presented in the graph in

Figure 2 and the compromises they require.

N Description; Compromises/Advantages.

The user works in a big city. To move somewhere else

is an unacceptable compromise for him/he

The user owns a residential property. This makes

him/her feel secure.

The user does not own a residential property. This is

risky. Not having a property is an unacceptable

compromise for him/her.

The user owns more than one residential property.

This gives him/her great safety.

The user rents a property and saves to buy a

residential property of his/her own; It takes years of

atience, but it’s ris

-free.

The user commutes to his/her workplace every day or

on a schedule and saves on the purchase of a

residential property. It takes years of patience, but it's

not risky.

The user renovates and improves the properties

possessed by him/her. The period for raising funds for

the purchase of a new property is extended. Safety is

rovided. Acceptable compromise.

The user gets a mortgage credit amounting at 60% of

the price of the new property, but he/she has no other

savings. A consumer credit is required for the

remaining amount of money needed. There is a great

risk for all his/her property. Living with two loans

would mean great restrictions. A difficult to accept

compromise.

The user sells his/her only property. Loss of property.

Risk of running out of property. Unacceptable

compromise.

Sells one of his/her properties. Loss of property.

Difficult compromise to accept.

20% of the price collected. Enough to get a mortgage.

Acceptable compromise. Brings safety.

40% of the property price available -

enough to get a mortgage. Acceptable compromise.

Safety.

He/she sells his/her only property, but only after

he/she has collected 20% of the necessary funds. Loss

of property. Unacceptable compromise.

Sells one of his residential properties, but after he

has got 40% of the necessary funds ready. Loss of

roperty. Acceptable compromise.

50% of the necessary funds available after the sale.

Loss of property and risk of funds shortage.

Acceptable compromise.

Takes a mortgage credit 80% of the value of the new

property and has the remaining funds available.

Acceptable risk. Acceptable compromise.

He/she takes a mortgage on 60% of the new property

and has the remaining funds. Acceptable risk.

Acceptable compromise.

Takes a consumer credit to cover the mortgage up to

100%. Risk for all property. Must live in limitations

and deprivation. Difficult compromise to accept.

Takes a 30% mortgage credit and has the remaining

funds available. This is risk-free and no compromise

is required.

Takes a 10% mortgage credit and has the remaining

funds available. This is a great level of security. No

compromises required.

Takes a mortgage credit 50% of the price of the

property and has the remaining funds available. There

is some risk. Acceptable compromise.

Buys an old but larger residential property. Though

the

roperty is old, the compromise is acceptable.

Buys a very small office in order to use it both as a

hotel room and as an office. It is not a residential

property and the expenses for taxes, electricity and

wate

are highe

. Safety. Acceptable compromise.

Buys a new very small residential property in the city

where he works. No compromise needed. This is the

dream home.

Table 2: Meaning of the colors of the nodes and edges in

the graph, given in Figure 2.

Colour Descri

tion

Green The state leads to securit

Dark red

The state requires a highly

unacceptable compromise.

Red

The state requires an unacceptable

compromise

Orange

The state requires a poorly

acce

table com

romise.

Yellow

Achieving this state is highly

desirable.

Blue

It means an acceptable state in which

there is no risk, but a poorly

acceptable compromise is required

to be made.

Table 3: Description of the emotion represented by the

colors of the nodes and edges in the graph, given in Figure

Colo

Emotion

Green Securit

Panic, anxiet

, dissatisfaction.

Yellow Joy and enthusiasm.

Blue Calm and hope.

Is there a sequence of actions making the user feel

excited and happy all the way to the goal? It turns out

that such a sequence of actions on the way to the goal

does not exist and compromises are required. So what

are the most acceptable compromises? Are there

actions that guarantee greater security, but not so

much elan and enthusiasm in the user and what are

the most acceptable compromises? It is precisely this

type of actions, which can be considered to be the

most acceptable compromises. Also, are there actions

that require more time, are less safe, cause some

inconvenience, and are still acceptable? The aim is to

avoid the unacceptable actions. It can be seen from

Figure 2 that the sale of the properties he/she owns is

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

184

an unacceptable action for the person under

consideration.

4 FIRST EXPERIMENT

A buyer, who does not own any property is

considered. After starting a secure job in a large

industrial city, he/she wants to buy a place to live. The

system offers the fastest way to achieve this goal,

namely, to take a mortgage loan from the bank up to

80% and to cover the remaining 20% with a consumer

loan (Figure 3).

A dotted black line shows the sequence of states

until a solution is reached. The system offers this

solution if the modification of the reinforcement

learning algorithm is not used. In this case, there will

be a great risk over the years until the consumer loan

is repaid. After that moment only the mortgage will

remain. The amount of loan installments will be

drastically reduced and the user could feel calmer and

lead a normal life.

When the criteria-based model, presented by the

Broker Matrix is used, it is established that the

consumer considers taking such loans to be highly

risky (orange states 7 and 17). Taking a mortgage

loan of up to 80% of the sum is relatively promising

for him/her. However, it is not acceptable to take a

consumer loan in parallel in order to fully cover the

price of the property. This action makes the user

anxious.

Figure 3: A dotted black line shows a sequence of states for

buying a property by a person, who does not have any

residential properties. The system offers this sequence only

if the newly introduced criteria-based model, presented by

the Broker Matrix, is not used.

A condition is set therefore - to offer the buyer

only actions, acceptable to him/her, i.e., actions,

perceived by him/her as reliable and secure and/or

which he/she would undertake with joy and readiness

or with mixed feelings between joy and hope, but

without panic and stress. This leads to the option

shown in Figure 4. According to it, the buyer could

live for several years in a rented apartment and save

money until the accumulation of at least 20% of the

price of the property he/she wishes to buy, having in

mind that he/she will then be able to buy the property

only against a mortgage loan. This option turns out to

be acceptable for the buyer. Depending on the years

he/she could spend on fundraising and the size of

his/her salary, he/she will have to decide which of the

proposed housing options to buy. Only appropriate

compromises were made and a suitable property was

chosen.

Figure 4: A dotted black line shows a sequence of states for

buying a property by a person, who does not have any

residential properties. It is proposed by the system when

using the newly introduced criteria-based model, presented

by the Broker Matrix.

5 SECOND EXPERIMENT

This time a buyer, who starts his dream job in a large

industrial city but owns properties in another smaller

provincial town is considered. He/she wants to buy a

property near his/her workplace. A survey, conducted

with the potential buyer reveals that he/she prefers

security and does not like to take risks. He/she is

reluctant to sell real estate and this thought strains and

repulses him. On the contrary, he/she loves to travel

and would like to regularly invest small sums to

maintain and improve the properties he/she owns.

He/she loves travelling and although it takes time,

he/she would gladly travel for several years. He wants

to speed up the deal as much as possible and therefore

prefers to buy a home as soon as he collects 20% of

the sum. It is known that the bank could give him a

mortgage up to 80% of the value of the property.

Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises

185

The system is looking for a way in which the user

can buy the most suitable home in the best possible

way. A sequence of actions should be proposed that

will allow him/her to feel as happy and confident as

possible.

5.1 Result of a Survey Conducted with

the Buyer, Aiming at Clarifying

His/Her Way of Thinking

A survey with a specific buyer on his/her opinion and

the emotions he/she feels about the different ways of

buying a property are presented in Figure 2. Here are

the more important considerations of the buyer.

A small newly built office - smaller than the area

of the homes under construction - is a compromise

option, as it is a cheaper property, smaller, but

sufficient for both residential needs and business

solution. The minimalist lifestyle is acceptable to

him/her. The required amount of money will be

collected in a shorter time. The risk is lower. This is

the most secure solution and is therefore marked in

green.

A residential property that is not newly-built and

is in need of renovation allows a few more squares for

the same price as the new but smaller home. This is

an unacceptable compromise for the user in question.

However, it is marked in orange because it can still

be considered as an affordable compromise.

On the one hand, raising more funds requires

more time. On the other hand, in case of availability

of a larger percentage from the price of the purchased

property, the user will feel more secure. That is why

the saved 20% of the price of the property are marked

in blue color, as not very secure but time-saving. An

available saved sum of up to 40% of the price of the

property is marked in green as a secure enough state.

The available 50% of the price of the property

coming from the sale of another property is also

marked in green as an amount that provides security.

Taking a mortgage loan of 50% - 60% is

considered a risk-free step to the goal. Mortgage loan

in the range of 10% - 30%, if the remaining funds are

available, gives not only security but also joy and

enthusiasm to the buyer. That is why it is marked in

yellow.

The maintenance and improvement of the

properties the buyer owns brings joy, satisfaction and

security to him/her, on the one hand. However, these

actions require investment and allocation of funds.

This, in turn, extends the period for collecting savings

to buy the dream home. Leaving the care of the owned

property causes panic, indignation and anger in the

buyer and would be unacceptable. The maintenance

and improvement of the properties he/she owns are

marked in yellow - the color of joy and satisfaction.

Commuting to work and back is tiring and a waste

of time, but it is acceptable for the user and gives

him/her security and comfort. Therefore, it is

considered a preferred action and is marked in green

- the color of security.

Figure 5: A sequence of the most secure possible conditions

for buying a property by a person, who has properties in a

location other than where he wants to buy a property. The

innovative criteria-based model presented by the Broker

Matrix is used.

Renting is acceptable for the buyer if the rent is

not high. This means that the conditions of living will

be only limited to the most basic ones and that his/her

life will be minimalistic for years, but full of hope.

Therefore, this action is marked in blue - the color of

hope.

5.2 Solutions, Proposed by the System

Figure 5 shows the sequence of states for buying a

residential property by a person, who already has

another property in a settlement other than the place

where he/she wants to buy one. This sequence is

offered by the system when using the newly

introduced criteria-based model presented by the

Broker Matrix. The goal is to choose a course of

action that is as secure as possible for the buyer. It can

be seen that the proposed path covers nodes 5, 6, 11,

16 and 22, which are secure. Nodes 0 and 3 are

yellow, i.e., they make the person happy. It means

that the buyer cannot give up his job and cannot sell

the properties he possesses. The system does not

suggest these unacceptable options to him/her.

Figure 6 suggests another option that would be

acceptable to this user. This is the sequence of actions

that would make our user as happy as possible at

every step, but in which the risk is greater. It can be

seen that in this case not all actions are in yellow, i.e.,

the user will have to make compromises though

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

186

acceptable ones. They are colored blue and are related

to security. The user, on the one hand, saves time and

speeds up the purchase of the property by collecting

only 20% of its price. On the other hand, he focuses

on buying a more expensive new residential property,

instead of a small and cheaper office. However, the

risk in this case is higher. He/she takes a bigger

mortgage and will pay it off for a longer time.

Figure 6: A sequence of states that will make the user as

happy and enthusiastic as possible when buying a property

though he/she will have to compromise on security. The

user owns properties in a location other than the one he/she

wants to buy the new property in. The innovative criteria-

based model presented by the Broker Matrix is used.

6 CONCLUSIONS

This paper proposes a new SBLA that includes a

memory block (criteria-based model, model of

rewards, model of the environment), block of

knowledge (of the possible solutions, the individual

requirements and emotions of a user, as well as of the

possible scenarios), appropriate actions/states

marking block, training block, containing a

Reinforcement learning algorithm, explanation block,

solution visualization block. The aim is to empower

the learning agent to propose an appropriate way of

reaching a suitable goal. The use of the criteria-based

model represented as an additional agent memory

matrix is important. This model shows how the user

perceives and evaluates the potential goals and the

possibilities for their realization. This means that

knowledge of the user's habits and understandings is

required.

The agents can make a compromise by not

following a given criterion. The criteria are arranged

by their level of emotional acceptability for the user.

That is way the agent can choose the most acceptable

compromises. The learning agent can solve problems

by not allowing unacceptable compromises to be

made. On the one hand, the Pareto front can provide

a set of optimal compromise solutions. On the other

hand, the proposed SBLA and reinforcement learning

algorithm can provide a way of achieving the goal by

means of the most acceptable compromises.

The introduced criteria-based model, represented

by the Broker Matrix is not a probabilistic one. It

reflects the user's opinion on the considered problem.

This is useful when solving problems, not common in

a user's life and when there is no statistics on user

actions. An example of such a problem is the

purchase of a residential property. And it is possible

that the user buys a property for the first and last time

in his life.

Also, the development and use of criteria-based

models allows to avoid the use of penalties in the

work of the reinforcement learning algorithm.

Instead, the choice of actions can be explained. If

emotional, motivational and other models are built,

then the learning agent will be able to give

explanations for each action from a different point of

view.

The proposed SBLA is also suitable for

negotiating and modeling empathy. These activities

require an understanding of consumer needs,

understandings and preferences as well (Gehghani et

al., 2012; Johnson et al., 2019; Paiva et al., 2017;

Maslow, 1998).

ACKNOWLEDGEMENTS

The authors gratefully acknowledge the financial

support provided within the Technical University of

Sofia, Research and Development Sector, Project for

PhD student helping N202PD0007-19 “Intelligent

Cognitive Agent behaviour modelling and

researching”.

REFERENCES

Gehghani, M., Gratch, J., Carnevale, P. J., 2012.

Interpersonal Effects of Emotions in Morally-charged

Negotiations. Proceedings of the Annual Meeting of the

Cognitive Science Society, Volume 34, 1476-1481.

Johnson, E., Roediger, S., Lucas, G., Gratch, J., 2019.

Assessing Common Errors Students Make When

Negotiating. 19th ACM International Conference on

Intelligent Virtual Agents (IVA’19). ACM, Paris,

France, 30-37, DOI: http://doi.org/10.1145/

3308532.3329470.

Paiva, A., Leite I., Boukricha, H., Wachsmuth, I., 2017.

Empathy in Virtual Agents and Robots: A Survey. ACM

Smart Broker Agent Learning How to Reach Appropriate Goal by Making Appropriate Compromises

187

Trans. Interact. Intell. Syst. 7, 3, Article 11 (September

2017), 40 pages. https://doi.org/10.1145/2912150.

Budakova, D., Dakovski, L., 2019. Smart shopping system.

8th International scientific conference (TechSys’19).

Plovdiv, Bulgaria, 16-18 May 2019.

doi:10.1088/issn.1757- 899X; ISSN: 1757-899X;

ISSN: 1757-8981.

Sutton, R. S., Barto, A. G., 2014. Reinforcement Learning:

An Introduction. MIT Press, Cambridge, London,

England, [Online]. Available:

http://incompleteideas.net/book/ebook/the-book.html.

Gosavi, A., 2009. Reinforcement Learning: A Tutorial

Survey and Recent Advances. INFORMS Journal on

Computing. Vol. 21 No.2, pp. 178-192, 2009.

Torrado, R. R., Bontrager, Ph., Togelius, Liu, J. J. and

Perez-Liebana, D., 2018. Deep Reinforcement

Learning for General Video Game AI. IEEE

Conference on Computatonal Intelligence and Games.

CIG, 10.1109/CIG.2018.8490422.

Argall, B. D., 2009. Learning Mobile Robot Motion Control

from Demonstration and Corrective Feedback. Thesis.

Robotics Institute Carnegie Mellon University

Pittsburgh, PA 15213, 172.

Amor, H. B., Vogt D., Ewerton M., Berger, E., Jung, B.,

Peters, J., 2013. Learning Responsive Robot Behavior

by Imitation. IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS 2013). IEEE,

Japan, 3257-3264.

Takahashi, K., Kim, K., Ogata, T., Sugano, S., 2017. Tool-

body assimilation model considering grasping motion

through deep learning. Robotics and Autonomous

Systems. Elsevier, Volume 91, 115–127.

Moffaert, K. V., 2016. Multi-Criteria Reinforcement

Learning for Sequential Decision Making Problems,

Dissertation for the degree of Doctor of Science:

Computer Science, Brussels University Press, ISBN

978 90 5718 094 1.

Moffaert, K. V., Nowé, A., 2014. Multi-objective

reinforcement learning using sets of pareto dominating

policies. Journal of Machine Learning Research,

15:3483–3512.

Natarajan, S., Tadepalli, P., 2005. Dinamic Preferences in

Multi-Criteria Reinforcement Learning. 22nd

International Conference on Machine Learning. Bonn,

Germany.

Gunantara, N., 2018. A review of multi-objective

optimization: Methods and its applications. Cogent

Engineering, 5(1), 1502242.

https://doi.org/10.1080/23311916.2018.1502242

Cho, J., Wang, Y., Chen, I., Chan, K. S., Swami A., 2017,

"A Survey on Modeling and Optimizing Multi-

Objective Systems," in IEEE Communications Surveys

& Tutorials, vol. 19, no. 3, pp. 1867-1901, third quarter

2017, doi: 10.1109/COMST.2017.2698366.

Vachhani, V. L., Dabhi V. K., Prajapati, H. B., 2015.

"Survey of multi objective evolutionary algorithms,"

International Conference on Circuits, Power and

Computing Technologies [ICCPCT-2015], Nagercoil,

2015, pp. 1-9, doi: 10.1109/ICCPCT.2015.7159422.

Budakova, D., Dakovski L., Petrova-Dimitrova, V., 2019.

Smart Shopping Cart Learning Agents Development.

19th IFAC-PapersOnLine, Conference on

International Stability, Technology and Culture,

(TECIS 2019). Volume 52, Issue 25, 26-28 September,

64-69, Sozopol, Bulgaria, Elsevier ISSN 2405-

8963,https://doi.org/10.1016/j.ifacol.2019.12.447

Budakova, D., Dakovski, L., Petrova-Dimitrova, V., 2019.

Smart Shopping Cart Learning Agents. International

journal on Advances in internet technology, IARIA,

issn: 1942-2652, Vol. 12, nr 3&4. 109 – 121.

Maslow, A. H., 1998. Motivation and Personality,

Addison-Wesley Education Publishers, 2nd Edition,

Paperback, 400 pages, ISBN: 0060442417 (ISBN13:

9780060442415).

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

188