Requisite Social Inﬂuence in Self-Regulated Systems

Asimina Mertzani

and Jeremy Pitt

Electrical and Electronic Engineering Dept., Imperial College London, London, U.K.

Keywords:

Cybernetics, Self-Regulated Systems, Requisite Inﬂuence, Reinforcement Learning, Social Psychology.

Abstract:

This paper speciﬁes, implements and experiments with a new psychologically-inspired 4voices algorithm

to be used by the units of a self-regulated system, whereby each unit learns to identify which of several

“voices” to pay attention to, depending on a collective desired outcome (e.g., establishing the ground truth, a

community truth, or their own “truth”). In addition, a regulator uses a standard Q-learning algorithm to pay

attention to the regulated units and respond accordingly. The algorithm is applied to a problem of continuous

policy-based monitoring and control, and simulation experiments determine which initial conditions produce

systemic stability and what kind of “truth” is expressed by the regulated units. We conclude that this synthesis

of Q-learning in the regulator and 4voices in the regulated system establishes requisite social inﬂuence. This

maintains quasi-stability (i.e. periodic stability) and points the way towards ethical regulators.

1 INTRODUCTION

A self-regulated system in cybernetics comprises a

designated agent (or agency) acting as regulator by

operating on some control variables, and a regu-

lated system which applies changes in those variables

(Ashby, 2020). Moreover, the units of regulated sys-

tem, as communicative agents themselves, can report

the effects of applying the changes amongst them-

selves, being inter-connected by a social network.

From this perspective, establishing the pathways

to requisite inﬂuence (Ashby, 2020) that would en-

sure timely feedback, responsive control and overall

systemic stability, has two requirements: ﬁrstly, ex-

pression by the regulator to the regulated system and

attention of the regulated system to the regulator (and

vice versa), and secondly, social inﬂuence within the

social network of the regulated system, with respect

to some overall goal (Nowak et al., 2019).

In this paper, we examine these requirements in

the context of a self-regulated system addressing a

problem of continuous policy-based monitoring and

control, where the regulator speciﬁes a policy and the

regulated system applies the policy, and experiences

its effect. The regulator uses a standard Q-learning

algorithm to learn to pay attention to the regulated

units and responds (expresses a new policy) accord-

ingly. However, we also specify and implement a

https://orcid.org/0000-0002-6084-9212

https://orcid.org/0000-0003-4312-8904

new psychologically-inspired 4voices algorithm for

the units of the regulated system, whereby in deter-

mining their expression to the regulator, each unit of

the regulated system learns to identify which of sev-

eral “voices” to pay attention to, depending on a de-

sired collective outcome.

The psychological motivation is threefold. First,

we start from Nowak’s Regulatory Theory of Social

Inﬂuence (Nowak et al., 2019) which proposes that,

for reasons of coherence and cognitive efﬁciency, in

social networks targets seek sources by whom to be

inﬂuenced, as well as sources seeking targets. Sec-

ondly, these targets actually have a polyphony of in-

ner voices (Fernyhough, 2017), which, thirdly, are

more or less activated by selective auditory attention

(e.g. the cocktail party effect (Driver, 2001)).

To test the effectiveness of the new 4voices algo-

rithm with respect to systemic stability, a set of sim-

ulation experiments are designed and executed, with

the regulator using a Q-learning algorithm and the

units of the regulated system using the 4voices al-

gorithm. Essentially this uses a set of coefﬁcients to

learn how to distinguish between the inner voices de-

pending on the desired outcome (i.e., whether it is to

express the community truth, the ground truth or their

inner truth, cf. (Deutsch and Gerard, 1955)). We con-

clude that this synthesis of Q-learning in the regulator

and 4voices in the regulated system establishes req-

uisite social inﬂuence. This maintains stability and

points the way towards ethical regulators.

Mertzani, A. and Pitt, J.

Requisite Social Inﬂuence in Self-Regulated Systems.

DOI: 10.5220/0012310500003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 133-140

ISBN: 978-989-758-680-4; ISSN: 2184-433X

133

2 PROBLEM SPECIFICATION

In this section, we identify the pathways to requisite

inﬂuence in a self-regulated system as the expression

of the regulator to the regulated units, and their atten-

tion to the regulator; and the expression of the reg-

ulated units to the regulator, and its attention to their

feedback; and ﬁnally social inﬂuence in the social net-

work of the regulated units. This is the ‘generic prob-

lem’, and we apply it to a speciﬁc problem of con-

tinuous monitoring and control. Note this is not nec-

essarily an ‘optimal’ solution to the speciﬁc problem:

instead we are establishing a test-bed for stability and

requisite social inﬂuence in self-regulated systems.

2.1 The Generic Problem

We reﬁne Ashby’s (Ashby, 2020) exposition of a self-

regulated system by adding the 4voices model of at-

tention and expression to the regulated units, path-

ways to requisite inﬂuence, individual learning and

collective learning, as illustrated in Figure 1.

Speciﬁcally, following Ashby’s description of a

self-regulated system, the regulated units need to form

a reliable expression to make the regulator aware of

their state, and the regulator needs to ﬁnd the appro-

priate way to attract the attention of the units to com-

municate the effects of the change of policy. This be-

comes then an iterative process starting from the side

of the regulator who is selecting a policy, commu-

nicates it to the regulated units and inﬂuences them

while it continues to the side of the regulated units

which have to attend to that change, distinguish be-

tween the voices of social inﬂuence, and form an ex-

pression that will make the regulator aware of the ac-

tual effect of the change.

Accordingly, the aim of this paper is to design

a model in which the group (the regulated system)

learns where to pay attention and how to express it-

self and the regulator learns to attend to that expres-

sion, change policy, and communicate this change.

Regulated System

Attention

(Individual)

Foreground Noise

Background Noise

Expert Voice

Expression

(Individual)

Social Influence

Inner Voice

Expression

(Collective)

Self-Regulated System

Attention

(Collective)

Requisite Variety

Requisite

Influence

Requisite

Influence

Regulator

Figure 1: The Self-Regulated System.

2.2 The Speciﬁc Problem

The effectiveness of our 4voices model is tested in a

speciﬁc scenario of continuous monitoring and con-

trol: job scheduling in cloud computing. In more

detail, the system operates in epochs, in each epoch,

each agent i delegates a job to the cloud, and receives

a feedback whether the job will be processed in the

next epoch t. This speciﬁc problem was selected be-

cause it satisﬁes the necessary requirements of be-

ing dynamic and non-deterministic, since it comprises

heterogeneous networked units; the preferable policy

is a matter of context, opinion and self-adaptation;

while it is relatively easy to deﬁne some metrics that

describe its operation.

Speciﬁcally, the self-regulated system comprises

N agents connected over a social network which par-

ticipate in a cloud system. In every epoch t, each

agent i delegates a job with size j

∈ {1, J} and ur-

gency u

∈ {1, U} to the cloud, where J is the maxi-

mum job size and U is the maximum urgency. Based

on its policy, the regulator determines the number of

jobs to be processed m in the next epoch. Then, the

regulator orders all jobs based on urgency (descend-

ing) and size (ascending), selects the m ﬁrst jobs to

process and informs the agents.

Next, the regulator processes those jobs based on

the described order, which results in a total cost C

equal to the cost J of each job of size j

aggregated

with a ﬁxed cost F. C is equally distributed among

the m agents, resulting in the cost c

, and the average

cost C is equal to the total cost divided by the number

of jobs m. Also, jobs are processed with a delay q

The average delay Q equals the sum of the delays di-

vided by m, Q =

∑

k∈{1,..,m}

. The maximum delay Q

equals the sum of the delays of N jobs (assuming that

they were all accepted and processed according to the

described order), divided by N, Q

∑

k∈{1,..,N}

3 THE 4VOICES ALGORITHM

This section provides the formal speciﬁcation of the

4voices, the process of selection between those voices

as well as the learning of the regulated units and the

regulator. This section concludes with a speciﬁca-

tion of the formal 4voices algorithm, which comprises

three components: processes for the speciﬁc prob-

lem, processes of attendance and selection between

the voices, and processes of learning.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

134

3.1 Voices of the Regulated System

Drawing inspiration from psychological insights sug-

gesting that people facing decision-making problems

often activate an internal monologue used as a dis-

tinct type of auditory thinking called “inner speech”

or “inner voice” (Fernyhough, 2017), we include this

polyphony in inner speech in our formal model. Fol-

lowing the identiﬁcation in ergonomics of intimate,

personal, social and public distances (Osborne and

heat, 1979), we deﬁne four voices corresponding to

individual’s ‘own voice’ (Fernyhough, 2017), based

on direct personal experience; the ‘expert voice’, em-

anating from education or ‘trusted’ expertise (Horne

et al., 2016); the ‘foreground noise’, from the indi-

vidual’s social network (Nowak et al., 2019); and the

‘background noise’, coming from the community or

culture in which the individual is embedded (Deutsch

and Gerard, 1955).

Applying the 4voices algorithm to the problem of

job scheduling, depending on the delay, the cost, the

urgency and whether the job is accepted to be pro-

cessed or not, each agent i forms an opinion which is

reﬂected in its own voice n

. As such, if the job of

agent i is accepted, then the absolute value of the own

voice n

reﬂects the quality of service divided by the

cost, while the actual value is negated to reﬂect the in-

tuition that a higher noise corresponds to higher dis-

satisfaction while a lower noise reﬂects satisfaction,

and is given by Equation 1:

= −

− q

(1)

If the job of the agent is not accepted then, the own

voice n

remains the same as in the previous epoch.

Being part of a group, the agents are also exposed

to the voices of other agents. Thus, apart from their

own voice n

, they audit the noise generated by the

group n

(background noise), the noise from their so-

cial network n

(foreground noise) and the noise gen-

erated by the experts n

(experts voice).

This way, the expert voice n

which agent i might

audit reﬂects the actual state of the system, and is

equal to the average quality of service with respect

to the average individual cost aggregated with the ur-

gency factor U

. U

represents the trade-off between

trying to partially satisfying everyone (accepting all

the jobs regardless of the urgency resulting in high

average delay), and fully satisfying few (prioritising

the few the jobs of which are urgent resulting in small

average delay). Speciﬁcally, the urgency factor U

equals the sum of the urgencies of the m accepted jobs

minus the sum of the urgencies of the N − m rejected

jobs, divided by the sum of urgencies of all N jobs.

∑

k∈{1,..,m}

−

∑

k∈{m+1,..,N}

∑

k∈{1,..,N}

Overall, the expert voice is given by Equation 2,

where the ﬁrst term corresponds to the ‘objective’

quality of service while the second corresponds to the

urgency multiplied by a normalising constant u

= −

− Q

+ (U

∗ u

) (2)

Note that the term “expert” refers to the agents ini-

tialised to have an overview of the system, expressing

not just an opinion based on their personal experience

(as the other agents do), but an objective opinion re-

ﬂecting the actual state of the regulated system.

Furthermore, interconnected units comprising the

regulated system might be inﬂuenced by the opinion

(i.e. the noise) of their peers or they might seek for the

opinion of others if they are not conﬁdent about their

own opinion (or, in this case, voice), as depicted in

Nowak’s RTSI theory (Nowak et al., 2019). To reﬂect

the inﬂuence from the social network, we include the

foreground noise in the voices of our model. Speciﬁ-

cally, every agent i is connected with at least one other

agent and can be inﬂuenced by them. The social net-

work SN is a Small-World Scale-Free (Klemm and

Egu

ıluz, 2002) network resembling real life networks,

combining the properties of both ‘scale free’ (Watts

and Strogatz, 1998) and ‘small world’(Barab

asi et al.,

1999) networks, namely distribution of degree fol-

lowing the power low, high clustering coefﬁcient,

small average shortest path. The network comprises

N nodes (agents), initialised to have m

number of

fully connected nodes, and µ probability of adding an

edge to one of the fully connected nodes.

Following the formal speciﬁcation of social inﬂu-

ence in (Mertzani et al., 2022), each agent i has a

credence cr

i, j

to each of its agents j in their social

network SN

. Consequently, each agent i orders its

neighbours with a descending order of credence. The

foreground noise corresponds to the own voice pro-

duced by the agent j, where j is selected according to

the following process: agent i selects the ﬁrst agent

from the list CR

and with probability p − ε

net

uses

its own voice to produce the foreground noise. Other-

wise, the agent i selects a random agent. This stochas-

ticity in the selection of agent is used to restrict agent

from being always inﬂuenced by the same agent.

Accordingly, the foreground noise attended by

agent i is the own voice of the agent j from i’s social

network and is given by Equation 3:

Requisite Social Inﬂuence in Self-Regulated Systems

135

= n

| max

j∈SN

(cr

i, j

), w.p. (p − ε

net

) (3)

= n

|k ∈ random(SN

), w.p. ε

net

The background noise corresponds to the noise

produced by the community and, thus, is deﬁned as

the average of the own voices of a random sample of

the population, the size of which is ν =

agents, as

presented in Equation 4:

∑

i∈{1,..,ν}

(4)

3.2 Voice Selection for Individual &

Collective Expression

Each agent i has to select between the four voices

and form its individual expression e

. This decision

is made upon comparison of the values of attention

to the corresponding voices. Speciﬁcally, i has an

attention to each of the four voices, with att

be-

ing the attention to the own voice, att

the attention

to the expert voice, and att

and att

the attention

to the foreground and background noise respectively.

Therefore, in every epoch, the agent compares the

attentions and selects the greatest. For instance, if

max(att

, att

) is att

then i selects the own

voice and forms its individual expression (e

= n

Additionally, to encourage exploration and avoid

biasing the agents towards the voice selected during

the ﬁrst epochs, we introduce a stochastic parameter

(following the approach used for the computation of

the foreground noise). Speciﬁcally, with probability

p − ε

voice

, the voice with the greatest attention is se-

lected, while with probability ε

voice

the voice is se-

lected randomly. Note that, the four attentions are ini-

tialised to be equal, so that each agent is neutral in

the beginning, and are updated throughout the epochs

based on the experience of each agent.

Having access to its own voice, the foreground

noise, the background noise and the expert voice, each

agent i can select between those four voices and form

its individual expression e

. The individual expres-

sions of the agents are aggregated and form the col-

lective expression V that is presented in Equation 5:

V =

∑

i∈{1,..,N}

(5)

3.3 Learning to Attend to Voices

For the update of the attention, we use the reinforce-

ment coefﬁcient c which determines the amount of

positive or negative update occurring in every epoch.

Thus, depending on the value of the experimental pa-

rameter update either the agent learns to pay atten-

tion to the voice that deviates less from the experts

(update =‘Exp’), either learns to pay attention to the

voice that resembles the average expression of the

group (update =‘Col’) or learns to pay attention to

the voice that seems to have the best immediate effect

(update =‘Ind’) resulting in a lower own voice (agent

more satisﬁed) in the next epoch.

Speciﬁcally, in the ﬁrst case, after attending all the

voices, the agent reinforces its attention to the voice

that deviates less from the expert voice, by increasing

the attention to this voice by c, and penalises the at-

tention to the voice the deviates the most, by decreas-

ing it by c. If an expert can be access by that agent,

the voice of which the attention is increased is simply

the expert voice, but if an expert cannot be accessed,

then the attention to one of the other voices will be

increased. In the second case (update =‘Col’), the

attention to the voice that deviates less from the av-

erage expression of the agents (corresponding to

)

is increased and the one that deviates the most is de-

creased. Finally, when update =‘Ind’ the update in

the attention is based on the comparison of the value

of each voice agent in epoch t with the own voice

of i in t + 1. So, the agent computes the values of

)

− (n

)

t+1

, (n

)

− (n

)

t+1

, (n

)

− (n

)

t+1

, and

)

− (n

)

t+1

and the attention to the lower is in-

creased by c, while the one that is greater is decreased.

The process of updating the agents’ attention con-

stitutes a simple form of learning, where agents are

reinforcing their attention to the voices based on the

value of the information that they offer in evaluat-

ing the actions of the regulator. This process en-

ables agents to learn to which voice attention should

be paid, depending on what they consider more re-

liable to form their expression, based on their goal,

i.e. deﬁned by the parameter update. Therefore, if the

agents consider that the regulator should be aware of

what the experts think, then they will reinforce atten-

tion to the expert voice in order to form their individ-

ual expression based on that voice. This reﬂects the

conﬁrmation bias observed in human societies, since

often individuals tend to listen to the opinions that are

aligned with their beliefs, rather than the opinions that

are more appropriate.

3.4 Learning to Attend to Expressions

Moving from individual to regulator’s learning, here

we aim to design a regulator that has the required ra-

tional complexity to react effectively to the expres-

sions of the regulated system and produce systemic

stability. Therefore, instead of using reinforcement

coefﬁcients, we use Reinforcement Learning (RL).

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

136

In RL, such a learning process can be modeled as

a Markov Decision Process (MDP). The MDP is de-

ﬁned as (S, A, P, R, γ) where A is a set of actions, S is

a set of states, and P is the state transition probability

function, R the reward function and γ is the discount

factor. Thus, s is a state from the set S , a is an action

selected from the set A, P(s, a) deﬁnes the probability

of a transition from state s

to state s

t+1

when an agent

executes action a and r ∈ R is the immediate reward

received when agents performs an action.

A policy π maps from states to probability distri-

butions over actions and is denoted by π : S → p(A =

a|S). An agent’s goal is to learn a policy π that max-

imises its expected return R

∑

∞

t=0

. To learn

that policy one can learn an action-value function that

is deﬁned as Q : (S × A), where Q(s, a) represents the

value of action a in state s. (Watkins, 1989) intro-

duced the Q-learning algorithm, as a way of learning

the optimal state-action value Q based on the Bellman

equation. According to the Q-learning algorithm, the

function Q is updated at each time step by:

Q(s

, a

) = Q(s

, a

) + α(r

+ γmaxQ(s

t+1

, a

) − Q(s

, a

))

where α is the step-size and (s

, a

, r

, s

t+1

) is the tran-

sition from state s

to s

t+1

with reward r

and γ is the

learning rate.

To enable large scale RL, (Li, 2017) proposed the

use of deep neural networks (DNN). (Mnih et al.,

2015) introduced Deep Q-Learning Networks (DQN)

which are using DNNs for approximating the optimal

state-action function Q in Q-learning. At each time

step the DQN is using stochastic gradient descent to

minimise the loss between the learnt network θ and

the target θ

−

, described by:

+ γmaxQ

−

t+1

, a

t+1

) − Q

, a))

(6)

Here, we use a variant of Q-learning algo-

rithm (Watkins and Dayan, 1992) with Deep Q-

Learning Network (DQN) (Mnih et al., 2013), which

combines Double Q-Learning (van Hasselt et al.,

2015) to prevent overestimating with Dueling Net-

works (Wang et al., 2016) to generalise learning

across different actions without changing the under-

lying RL algorithm. We use the implementation of

DQN from (Rafﬁn et al., 2021), and the model is

trained to minimise the loss in Equation 6.

The action space of the regulator has dimension

N × 1 and comprises a vector with all possible m

∈

{1, N}, the state space has dimension N × 3 and com-

prises 3 vectors describing the jobsize of each agent

in this epoch j

, the urgency of this job u

and the sig-

nal s

corresponding to whether the job of the agent

was accepted or rejected. The reward equals the neg-

ative of the collective expression i.e. R = −E to

reward positively low collective expression (satisfac-

tion) and negatively high collective expression (dis-

satisfaction). We use a Deep Q-Learning Network

(DQN) to increase the regulator’s complexity com-

pared to the regulated system and enable it to effec-

tively respond to the expressions of the agents. Fi-

nally, the process of training lasts 10000 epochs di-

vided into episodes of 2000 epochs, while the learn-

ing starts after the 1000 epoch. The exploration con-

stant ε is initialised to 1 and, during training, is being

progressively reduced to 0.1.

3.5 The 4voices Algorithm

Following to the description above, the algorithm

describing our 4voices model is presented in Algo-

rithm 1. The parts in green refer to the speciﬁc prob-

lem, i.e. job scheduling in cloud, the parts in blue refer

to the 4voices, and the parts in purple refer to learning.

while t < T do

Regulator Speciﬁes Policy;

Agents Generate Jobs;

Job Scheduling According to Regulator’s Policy;

for each agent i, 1 ≤ i ≤ N do

Attend Own Voice (Equations 1);

Attend Expert Voice (Equation 2);

Attend Foreground Noise (Equation 3);

Attend Background Noise (Equation 4);

Voice Selection for Expression (Section 3.2);

end

Collective Expression (Equation 5);

Regulator: Learns to Attend to Expression (Section

3.4);

for each agent i, 1 ≤ i ≤ N do

Learn to Attend to Voices (Section 3.3);

end

inct;

end

Algorithm 1: The 4voices model.

4 EXPERIMENTAL RESULTS

In this short paper, we describe only two experi-

ments: whether the 4voices establishes the pathways

for requisite inﬂuence, i.e. attention/expression both

ways between regulator and regulated units, and un-

der which initial conditions stability is produced.

We set ε

net

= ε

voice

= 0.3, c = 0.01, u

= 10, J =

U = 10, F = 500 and N = 100, µ = 0.5, m

The experiments were repeated for many runs and the

graphs present the outcome after averaging over 10

runs. However, the second experiment presents the

results of a single representative run to highlight the

emerging behaviour.

Requisite Social Inﬂuence in Self-Regulated Systems

137

Ind Exp Col

Random

Figure 2: The mean value and the standard deviation of the individual expressions for Random and RL regulator after averag-

ing over 10 runs.

Random RL RL RL

Any Ind Exp Col

Figure 3: Expression of the Regulated Units when the Regulator is Random and when RL for Different updates of Attention

after averaging over 10 runs.

4.1 Pathways for Requisite Inﬂuence

This experiment focuses on the observation of

whether the 4voices establishes the pathways for req-

uisite inﬂuence. Figure 2 illustrates the mean value

and the standard deviation of the individual expres-

sions of the agents (y-axis) for different epochs (x-

axis), for different types of update of attentions of

the agents’ shown in different columns and for a ran-

dom and an RL regulator, presented in different rows.

Complementing the results of the individual perspec-

tive of the agents presented in Figure 2, Figure 3

provides the collective perspective, and speciﬁcally

shows the policy selected (i.e. number of agents to be

included in the next calculation) by the regulator in

y-axis at each epoch in x-axis. The different colours

of the dots represent the volume of noise, where the

blue correspond to comparatively low volume, while

the red correspond to comparatively high volume.

From Figure 2, we notice that when the regula-

tor selects a policy randomly, the agents’ noise differs

from epoch to epoch, reﬂecting the agents’ reaction

to the policy selected. In contrast, when the regulator

decides the policy based on a reinforcement learning

algorithm, the mean value of the agents’ noise is de-

creased for ‘Exp’ and ‘Col’ types of update of atten-

tion. This shows that the agents attend the expression

of the regulator and effectively adjust their individual

and collective expressions, depending on whether the

policy selected is desired or not.

The same observation (i.e. the pathways for requi-

site inﬂuence from the regulator to the regulated units

are established) can be seen in Figure 3 that illustrates

the collective expression with respect to the selected

policy by the regulator. Here, we notice that when

the policy includes multiple agents (more than 40) the

agents noise becomes lower, while when the regula-

tor includes only a few agents the group reﬂects the

dissatisfaction by increasing the volume of noise.

Additionally, when the agents’ update of attention

is based on ‘Exp’ or ‘Col’, their expression reﬂects

their satisfaction and dissatisfaction and establishes

the pathways for requisite inﬂuence from the regu-

lated units to the regulator. This can be observed in

the third and fourth columns of Figure 3, where the

regulator identiﬁes the range of policies that is desired

by the agents’ and after some iteration selects a pol-

icy only from the 50-80 agents interval. The combi-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

138

Individual Expression

Ind Exp Col

Random

Collective Expression

Figure 4: The mean value and the standard deviation of the individual expressions for Random and RL regulator and the

policy selection and the corresponding collective expression of an RL regulator of a single run.

nation of the observations made above shows that in

any case the pathways for requisite inﬂuence from the

regulator to the regulated units are established, while

the pathways from the regulated units to the regulator

are established only for ‘Exp’ and ‘Col’ types of up-

date. This is important since it constitutes a way to

enable a collective (comprising the regulated system)

to have inﬂuence on the learning algorithm of the reg-

ulator, by feeding it with the data that forms policy

which helps them to achieve their goal (i.e. maximi-

sation of collective satisfaction), similarly to (Hardt

et al., 2023).

4.2 Initial Conditions & Stability

Moreover, looking in the graphs resulted from a sin-

gle run in Figure 4, we notice that the there is a trade-

off between diversity and congruence. Speciﬁcally,

moving from the own voice and the foreground noise

(selected when the update is ‘Ind’) to the expert voice

and background noise (selected when the update of

attention is equal to ‘Exp’ and ‘Col’) the standard de-

viation of the individual expressions is decreased (low

diversity). Moreover, the results in the second col-

umn of the third row of Figure 4 show that the experts

noise remains almost the same if the selected policy

remains in the 40-80 agents interval, whereas in the

third column we notice that there is a limited interval

of satisfying policies in the ‘Col’ case which is rele-

vant to the network properties and the agents affecting

more the others’ expressions.

In this experiment we aimed to observe which

initial conditions result in systemic stability, which

is crucial when addressing issues of sustainability

in socio-technical systems. As such, combining the

ﬁndings of both experiments, we conclude that sys-

temic stability is maintained when the regulator se-

lects a policy using RL and the agents’ update their

attention based on the divergence from the experts’

voice or collective decision (update is ‘Exp’ or ‘Col’).

This is because, referring back to the third and fourth

columns of Figure 3, we notice that the regulator, af-

ter enough iterations, identiﬁes the appropriate range

of policies which results in comparatively low noise.

This produces systemic stability (reﬂected by the per-

sistent blue coloured volume of noise).

5 SUMMARY & CONCLUSIONS

This paper has essentially addressed two related prob-

lems in self-regulated systems. Firstly, ensuring that

pathways for requisite inﬂuence exist, in the form of

awareness (of the regulator) to the expression (of the

regulated units), in expression (by the regulator) and

attention (by the regulated units) to that expression,

Requisite Social Inﬂuence in Self-Regulated Systems

139

and in the form of social inﬂuence in the social net-

work of the regulated units. Secondly, to identify un-

der which initial conditions the self-regulated system

can maintain stability, and so establish what could be

called requisite social inﬂuence.

The solution proposed in this paper has been to

synthesise ideas from social inﬂuence and machine

learning to address the dual problems of requisite in-

ﬂuence and stability in continuous monitoring and

control of a dynamic and non-deterministic system.

Speciﬁcally, the contributions of this paper are

• based on ideas from opinion formation, dynamic

social psychology and psychoacoustics, to intro-

duce the 4voices model for regulated units, which

identiﬁes an own voice, expert voice, foreground

noise and background noise as possible sources of

social inﬂuence;

• to specify the 4voices algorithm, which com-

bines computation of a signal value for each of

the voices, with reinforcement of attention based

on the experience from past interactions with

the voices for the regulated units, and Deep Q-

learning for the regulator to learn the effect of its

actions; and

• experimental results which show that for control

processes in dynamic and non-deterministic sys-

tems, the 4voices algorithm establishes both the

required relational complexity and the pathways

for requisite inﬂuence, so that the systemic stabil-

ity is maintained.

Beyond this, according to Ashby (Ashby, 2020),

an ethical regulator is required not only to reach re-

liable decisions from potentially unreliable evidence,

but also to evaluate the consequences of its decisions,

which raises issues of transparency and accountabil-

ity in the regulator. Correspondingly, though, an ethi-

cally regulated unit should try to provide information

to the best of its knowledge and belief. The future

challenge lies in ensuring ethical behaviour with re-

spect to values, especially in systems with multiple

stakeholders with different priorities and preferences

with respect to those values.

REFERENCES

Ashby, M. (2020). Ethical regulators and super-ethical sys-

tems. Systems, 8(5):3:1–3:36.

Barab

asi, A.-L., Albert, R., and Jeong, H. (1999). Mean-

ﬁeld theory for scale-free random networks. Phys-

ica A: Statistical Mechanics and its Applications,

272(1):173–187.

Deutsch, M. and Gerard, H. (1955). A study of norma-

tive and informational social inﬂuences upon individ-

ual judgment. The Journal of Abnormal and Social

Psychology, 51(3):629–636.

Driver, J. (2001). A selective review of selective attention

research from the past century. British Journal of Psy-

chology, 92(1):53–78.

Fernyhough, C. (2017). The Voices Within. Wellcome Col-

lection.

Hardt, M., Mazumdar, E., Mendler-D

unner, C., and Zrnic,

T. (2023). Algorithmic collective action in machine

learning. ICML’23. JMLR.org.

Horne, B., Nevo, D., Freitas, J., Ji, H., and Adali, S. (2016).

Expertise in social networks: How do experts differ

from other users? In Proceedings Tenth International

AAAI Conference on Web and Social Media, pages

583–586.

Klemm, K. and Egu

ıluz, V. M. (2002). Growing scale-free

networks with small-world behavior. Phys. Rev. E,

65:057102.

Li, Y. (2017). Deep reinforcement learning: An overview.

arXiv preprint arXiv:1701.07274.

Mertzani, A., Pitt, J., Nowak, A., and Michalak, T. (2022).

Expertise, social inﬂuence, and knowledge aggrega-

tion in distributed information processing. Artiﬁcial

Life, 29(1):37–65.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,

Antonoglou, I., Wierstra, D., and Riedmiller, M. A.

(2013). Playing atari with deep reinforcement learn-

ing. CoRR, abs/1312.5602.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,

J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-

jeland, A. K., Ostrovski, G., et al. (2015). Human-

level control through deep reinforcement learning.

Nature, 518(7540):529–533.

Nowak, A., Vallacher, R., Rychwalska, A., Roszczynska-

Kurasinska, M., Ziembowicz, K., Biesaga, M., and

Kacprzyk, M. (2019). Target in control: Social in-

ﬂuence as distributed information processing. Cham,

CH: Springer.

Osborne, D. and heat, T. (1979). The role of social space

requirements in ergonomics. Applied Ergonomics,

10(2):99–103.

Rafﬁn, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus,

M., and Dormann, N. (2021). Stable-baselines3: Reli-

able reinforcement learning implementations. Journal

of Machine Learning Research, 22(268):1–8.

van Hasselt, H., Guez, A., and Silver, D. (2015). Deep re-

inforcement learning with double q-learning.

Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanc-

tot, M., and de Freitas, N. (2016). Dueling network

architectures for deep reinforcement learning.

Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine

learning, 8:279–292.

Watkins, C. J. C. H. (1989). Learning from delayed rewards.

Dissertation, King’s College, Cambridge.

Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics

of ‘small-world’ networks. Nature, 393(6684):440–

442.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

140