and in the form of social influence in the social net-
work of the regulated units. Secondly, to identify un-
der which initial conditions the self-regulated system
can maintain stability, and so establish what could be
called requisite social influence.
The solution proposed in this paper has been to
synthesise ideas from social influence and machine
learning to address the dual problems of requisite in-
fluence and stability in continuous monitoring and
control of a dynamic and non-deterministic system.
Specifically, the contributions of this paper are
• based on ideas from opinion formation, dynamic
social psychology and psychoacoustics, to intro-
duce the 4voices model for regulated units, which
identifies an own voice, expert voice, foreground
noise and background noise as possible sources of
social influence;
• to specify the 4voices algorithm, which com-
bines computation of a signal value for each of
the voices, with reinforcement of attention based
on the experience from past interactions with
the voices for the regulated units, and Deep Q-
learning for the regulator to learn the effect of its
actions; and
• experimental results which show that for control
processes in dynamic and non-deterministic sys-
tems, the 4voices algorithm establishes both the
required relational complexity and the pathways
for requisite influence, so that the systemic stabil-
ity is maintained.
Beyond this, according to Ashby (Ashby, 2020),
an ethical regulator is required not only to reach re-
liable decisions from potentially unreliable evidence,
but also to evaluate the consequences of its decisions,
which raises issues of transparency and accountabil-
ity in the regulator. Correspondingly, though, an ethi-
cally regulated unit should try to provide information
to the best of its knowledge and belief. The future
challenge lies in ensuring ethical behaviour with re-
spect to values, especially in systems with multiple
stakeholders with different priorities and preferences
with respect to those values.
REFERENCES
Ashby, M. (2020). Ethical regulators and super-ethical sys-
tems. Systems, 8(5):3:1–3:36.
Barab
´
asi, A.-L., Albert, R., and Jeong, H. (1999). Mean-
field theory for scale-free random networks. Phys-
ica A: Statistical Mechanics and its Applications,
272(1):173–187.
Deutsch, M. and Gerard, H. (1955). A study of norma-
tive and informational social influences upon individ-
ual judgment. The Journal of Abnormal and Social
Psychology, 51(3):629–636.
Driver, J. (2001). A selective review of selective attention
research from the past century. British Journal of Psy-
chology, 92(1):53–78.
Fernyhough, C. (2017). The Voices Within. Wellcome Col-
lection.
Hardt, M., Mazumdar, E., Mendler-D
¨
unner, C., and Zrnic,
T. (2023). Algorithmic collective action in machine
learning. ICML’23. JMLR.org.
Horne, B., Nevo, D., Freitas, J., Ji, H., and Adali, S. (2016).
Expertise in social networks: How do experts differ
from other users? In Proceedings Tenth International
AAAI Conference on Web and Social Media, pages
583–586.
Klemm, K. and Egu
´
ıluz, V. M. (2002). Growing scale-free
networks with small-world behavior. Phys. Rev. E,
65:057102.
Li, Y. (2017). Deep reinforcement learning: An overview.
arXiv preprint arXiv:1701.07274.
Mertzani, A., Pitt, J., Nowak, A., and Michalak, T. (2022).
Expertise, social influence, and knowledge aggrega-
tion in distributed information processing. Artificial
Life, 29(1):37–65.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. A.
(2013). Playing atari with deep reinforcement learn-
ing. CoRR, abs/1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529–533.
Nowak, A., Vallacher, R., Rychwalska, A., Roszczynska-
Kurasinska, M., Ziembowicz, K., Biesaga, M., and
Kacprzyk, M. (2019). Target in control: Social in-
fluence as distributed information processing. Cham,
CH: Springer.
Osborne, D. and heat, T. (1979). The role of social space
requirements in ergonomics. Applied Ergonomics,
10(2):99–103.
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus,
M., and Dormann, N. (2021). Stable-baselines3: Reli-
able reinforcement learning implementations. Journal
of Machine Learning Research, 22(268):1–8.
van Hasselt, H., Guez, A., and Silver, D. (2015). Deep re-
inforcement learning with double q-learning.
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanc-
tot, M., and de Freitas, N. (2016). Dueling network
architectures for deep reinforcement learning.
Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine
learning, 8:279–292.
Watkins, C. J. C. H. (1989). Learning from delayed rewards.
Dissertation, King’s College, Cambridge.
Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics
of ‘small-world’ networks. Nature, 393(6684):440–
442.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
140