
ideal case, in practice worse due to non-linearities),
but combinatorially in the outer loop. However, sev-
eral other formulations exist in literature, including
heuristic methods, that keep it tractable up to 20+ con-
trollable units.
BROADER IMPACT STATEMENT
Our contributions make more industrial machines con-
trollable in an autonomous way. We mainly envi-
sion positive impacts on society, such as reduced en-
ergy consumption for the same manufacturing quality,
higher manufacturing quality (less waste) and a gen-
eral improved economy. The machines that would
benefit from our contribution are usually not directly
controlled by people, so we don’t expect jobs to be
lost to automation with our contribution. We however
acknowledge that any improvement in automation also
improves it for sensitive use, such as military equip-
ment production.
ACKNOWLEDGMENTS
This research is done in the framework
of the Flanders AI Research Program
(https://www.flandersairesearch.be/en) that is fi-
nanced by EWI (Economie Wetenschap & Innovatie),
and Flanders Make (https://www.flandersmake.be/en),
the strategic research Centre for the Manufacturing
Industry, who owns the Demand Driven Use Case.
The authors would like to thank everybody who
contributed with any inputs to make this publication.
REFERENCES
Andersson, J. A., Gillis, J., Horn, G., Rawlings, J. B., and
Diehl, M. (2019). Casadi: a software framework for
nonlinear optimization and optimal control. Mathemat-
ical Programming Computation, 11(1):1–36.
Asadi, K., Parikh, N., Parr, R. E., Konidaris, G. D., and
Littman, M. L. (2021). Deep radial-basis value func-
tions for continuous control. In Proceedings of the
AAAI Conference on Artificial Intelligence, volume 35,
pages 6696–6704.
Bakker, B. (2001). Reinforcement learning with long short-
term memory. Advances in neural information process-
ing systems, 14.
Biegler, L. T. and Zavala, V. M. (2009). Large-scale nonlin-
ear programming using ipopt: An integrating frame-
work for enterprise-wide dynamic optimization. Com-
puters & Chemical Engineering, 33(3):575–582.
Borase, R. P., Maghade, D., Sondkar, S., and Pawar, S.
(2021). A review of pid control, tuning methods and
applications. International Journal of Dynamics and
Control, 9:818–827.
Camacho, E. F. and Alba, C. B. (2013). Model predictive
control. Springer science & business media.
Chen, R., Lan, F., and Wang, J. (2024). Intelligent pressure
switching control method for air compressor group
control based on multi-agent reinforcement learning.
Journal of Intelligent & Fuzzy Systems, 46(1):2109–
2122.
De Somer, O., Soares, A., Vanthournout, K., Spiessens, F.,
Kuijpers, T., and Vossen, K. (2017). Using reinforce-
ment learning for demand response of domestic hot
water buffers: A real-life demonstration. In 2017 IEEE
PES Innovative Smart Grid Technologies Conference
Europe (ISGT-Europe), pages 1–7. IEEE.
Frank, R. J., Davey, N., and Hunt, S. P. (2001). Time series
prediction and neural networks. Journal of intelligent
and robotic systems, 31:91–103.
Fujimoto, S., Hoof, H., and Meger, D. (2018). Addressing
function approximation error in actor-critic methods.
In International conference on machine learning, pages
1587–1596. PMLR.
Glover, K. (2021). H-infinity control. In Encyclopedia of
systems and control, pages 896–902. Springer.
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S.,
Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al.
(2018). Soft actor-critic algorithms and applications.
arXiv preprint arXiv:1812.05905.
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostro-
vski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.,
and Silver, D. (2018). Rainbow: Combining improve-
ments in deep reinforcement learning. In Proceedings
of the AAAI conference on artificial intelligence, vol-
ume 32.
IEEE (2024). Ieee pes power grid benchmarks.
Lewis, F. L., Vrabie, D., and Syrmos, V. L. (2012). Optimal
control. John Wiley & Sons.
López-Ibáñez, M., Prasad, T. D., and Paechter, B. (2008).
Ant colony optimization for optimal control of pumps
in water distribution networks. Journal of water re-
sources planning and management, 134(4):337–346.
Lourenço, H. R., Martin, O. C., and Stützle, T. (2019). Iter-
ated local search: Framework and applications. Hand-
book of metaheuristics, pages 129–168.
Luo, J., Paduraru, C., Voicu, O., Chervonyi, Y., Munns,
S., Li, J., Qian, C., Dutta, P., Davis, J. Q., Wu, N.,
et al. (2022). Controlling commercial cooling sys-
tems using reinforcement learning. arXiv preprint
arXiv:2211.07357.
Mehra, R. and Davis, R. (1972). A generalized gradient
method for optimal control problems with inequality
constraints and singular arcs. IEEE Transactions on
Automatic Control, 17(1):69–79.
Meinhold, R. J. and Singpurwalla, N. D. (1983). Under-
standing the kalman filter. The American Statistician,
37(2):123–127.
Michalewicz, Z., Janikow, C. Z., and Krawczyk, J. B. (1992).
A modified genetic algorithm for optimal control prob-
Reinforcement Learning for Model-Free Control of a Cooling Network with Uncertain Future Demands
69