Assured Reinforcement Learning for Safety-critical Applications

George Mason, Radu Calinescu, Daniel Kudenko, Alec Banks

Abstract

Reinforcement learning (RL) is a machine learning technique where an autonomous agent uses the rewards received from its interactions with an initially unknown Markov decision process (MDP) to converge to an optimal policy, i.e., the actions to take in the MDP states in order to maximise the obtained rewards. Although successfully used in applications ranging from gaming to robotics, standard RL is not applicable to problems where the policies learned by the agent must satisfy strict constraints associated with the safety, reliability, performance and other critical aspects of the problem. Our project addresses this significant limitation of standard RL by integrating it with probabilistic model checking, and thus extending the applicability of the technique to mission-critical and safety-critical systems.

References

  1. Arcuri, A. and Briand, L. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 33rd Intl. Conf. Software Engineering, pages 1-10.
  2. Argall, B. D., Chernova, S., Veloso, M., et al. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469-483.
  3. Boger, J., Hoey, J., Poupart, P., et al. (2006). A planning system based on markov decision processes to guide Calinescu, R., Johnson, K., and Rafiq, Y. (2013). Developing self-verifying service-based systems. In 28th IEEE/ACM Intl. Conf. Automated Software Engineering, pages 734-737.
  4. Calinescu, R., Kikuchi, S., and Johnson, K. (2012). Compositional reverification of probabilistic safety properties for large-scale complex IT systems. In Large-Scale Complex IT Systems. Development, Operation and Management, pages 303-329. Springer Berlin Heidelberg.
  5. Dearden, R., Friedman, N., and Russell, S. (1998). Bayesian Q-learning. In 15th National Conference on Artificial Intelligence , pages 761-768.
  6. Efthymiadis, K. and Kudenko, D. (2015). Knowledge revision for reinforcement learning with abstract MDPs. In 14th Intl. Conf. Autonomous Agents and Multiagent Systems, pages 763-770.
  7. García, J. and Fern ández, F. (2012). Safe exploration of state and action spaces in reinforcement learning. Journal of Artifical Intelligence Research , 45(1):515- 564.
  8. García, J. and Fern ández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437-1480.
  9. Gerasimou, S., Calinescu, R., and Banks, A. (2014). Efficient runtime quantitative verification using caching, lookahead, and nearly-optimal reconfiguration. In9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pages 115-124.
  10. Gerasimou, S., Tamburrelli, G., and Calinescu, R. (2015). Search-based synthesis of probabilistic models for quality-of-service software engineering. In 30th IEEE/ACM Intl. Conf. Automated Software Engineering, pages 319-330.
  11. Hansson, H. and Jonsson, B. (1994). A logic for reasoning about time and reliability. Formal aspects of computing, 6(5):512-535.
  12. Heger, M. (1994). Consideration of risk in reinforcement learning. In 11th Intl. Conf. Machine Learning, pages 105-111.
  13. Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11):1238-1274.
  14. Kwiatkowska, M., Norman, G., and Parker, D. (2007). Stochastic model checking. In 7th Intl. Conf. Formal Methods for Performance Evaluation, volume 4486, pages 220-270.
  15. Kwiatkowska, M., Norman, G., and Parker, D. (2011). PRISM 4.0: Verification of probabilistic real-time systems. In 23rd Intl. Conf. Computer Aided Verification , volume 6806, pages 585-591.
  16. Lange, D. S., Verbancsics, P., Gutzwiller, R. S., et al. (2012). Command and control of teams of autonomous systems. In Large-Scale Complex IT Systems. Development, Operation and Management, pages 81-93. Springer Berlin Heidelberg.
  17. Li, L., Walsh, T. J., and Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. In9th International Symposium on Artificial Intelligence and Mathematics, pages 531-539.
  18. Marthi, B. (2007). Automatic shaping and decomposition of reward functions. In 24th Intl. Conf. Machine learning, pages 601-608.
  19. Mason, G., Calinescu, R., Kudenko, D., and Banks, A. (2016). Combining reinforcement learning and quantitative verification for agent policy assurance. In6th Intl. Workshop on Combinations of Intelligent Methods and Applications, pages 45-52.
  20. Mason, G., Calinescu, R., Kudenko, D., and Banks, A. (2017). Assured reinforcement learning with formally verified abstract policies. In9th International Conference on Agents and Artificial Intelligence . To appear.
  21. Mihatsch, O. and Neuneier, R. (2002). Risk-sensitive reinforcement learning. Machine Learning, 49(2):267- 290.
  22. Moldovan, T. M. and Abbeel, P. (2012). Safe exploration in Markov decision processes. In 29th Intl. Conf. Machine Learning, pages 1711-1718.
  23. Perkins, T. J. and Barto, A. G. (2003). Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3(1):803-832.
  24. Sutton, R. S., Precup, D., and Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181-211.
  25. Szita, I. (2012). Reinforcement learning in games. In Reinforcement Learning: State-of-the-art, pages 539-577. Springer-Verlag Berlin Heidelberg.
  26. Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3):279-292.
  27. Wiering, M. and Otterlo, M. (2012). Reinforcement learning and markov decision processes. In Reinforcement Learning: State-of-the-art, pages 3-42. SpringerVerlag Berlin Heidelberg.
Download


Paper Citation


in Harvard Style

Mason G., Calinescu R., Kudenko D. and Banks A. (2017). Assured Reinforcement Learning for Safety-critical Applications . In Doctoral Consortium - DCAART, (ICAART 2017) ISBN , pages 9-16


in Bibtex Style

@conference{dcaart17,
author={George Mason and Radu Calinescu and Daniel Kudenko and Alec Banks},
title={Assured Reinforcement Learning for Safety-critical Applications},
booktitle={Doctoral Consortium - DCAART, (ICAART 2017)},
year={2017},
pages={9-16},
publisher={SciTePress},
organization={INSTICC},
doi={},
isbn={},
}


in EndNote Style

TY - CONF
JO - Doctoral Consortium - DCAART, (ICAART 2017)
TI - Assured Reinforcement Learning for Safety-critical Applications
SN -
AU - Mason G.
AU - Calinescu R.
AU - Kudenko D.
AU - Banks A.
PY - 2017
SP - 9
EP - 16
DO -