Adversarial Reinforcement Learning in a Cyber Security Simulation

Richard Elderman; Leon J. J. Pater; Albert S. Thie; Madalina M. Drugan; Marco M. Wiering

doi:10.5220/0006197105590566

Adversarial Reinforcement Learning in a Cyber Security Simulation

Richard Elderman, Leon J. J. Pater, Albert S. Thie, Madalina M. Drugan, Marco M. Wiering

2017

Abstract

This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo learning with the Softmax exploration strategy is most effective in performing the defender role and also for learning attacking strategies.

References

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finitetime analysis of the multiarmed bandit problem. Machine Learning, 47:235-256.
Chung, K., Kamhoua, C., Kwiat, K., Kalbarczyk, Z., and Iyer, K. (2016). Game theory with learning for cyber security monitoring. IEEE HASE, pages 1-8.
Garivier, A. and Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. ALT.
Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In ICML, pages 157-163.
Neumann, J. V. and Morgenstern, O. (2007). Theory of games and economic behavior. Princeton University Press.
Sharma, A., Kalbarczyk, Z., Barlow, J., and Iyer, R. (2011). Analysis of security data from a large computing organization. In 2011 IEEE/IFIP DSN, pages 506-517.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. The MIT press, Cambridge MA.
Szepesvári, C. (1997). The asymptotic convergence-rate of q-learning. In NIPS, pages 1064-1070.
Tambe, M. (2011). Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press, New York, NY, USA, 1st edition.
Uther, W. and Veloso, M. (2003). Adversarial reinforcement learning. Technical Report CMU-CS-03-107.
Wang, Y., Li, T., and Lin, C. (2013). Backward q-learning: The combination of sarsa algorithm and q-learning. Eng. Appl. of AI, 26:2184-2193.
Watkins, C. and Dayan, P. (1992). Q-learning. Machine Learning, 8:279-292.
Wiering, M. and van Otterlo, M. (2012). Reinforcement Learning: State of the Art. Springer Verlag.

Download

Paper Citation

in Harvard Style

Elderman R., J. J. Pater L., S. Thie A., M. Drugan M. and M. Wiering M. (2017). Adversarial Reinforcement Learning in a Cyber Security Simulation . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 559-566. DOI: 10.5220/0006197105590566

in Bibtex Style

@conference{icaart17,
author={Richard Elderman and Leon J. J. Pater and Albert S. Thie and Madalina M. Drugan and Marco M. Wiering},
title={Adversarial Reinforcement Learning in a Cyber Security Simulation},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2017},
pages={559-566},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006197105590566},
isbn={978-989-758-220-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Adversarial Reinforcement Learning in a Cyber Security Simulation
SN - 978-989-758-220-2
AU - Elderman R.
AU - J. J. Pater L.
AU - S. Thie A.
AU - M. Drugan M.
AU - M. Wiering M.
PY - 2017
SP - 559
EP - 566
DO - 10.5220/0006197105590566