A Two-Phase Safe Reinforcement Learning Framework for Finding the Safe Policy Space

A. Westley; Gavin Rens

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

A Two-Phase Safe Reinforcement Learning Framework for Finding the Safe Policy Space

Topics: Constraint Satisfaction; Machine Learning; State Space Search

In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 275-285, 2025 , Porto, Portugal

Authors: A. J. Westley and Gavin Rens

Affiliation: Computer Science Division, Stellenbosch University, Stellenbosch, South Africa

Keyword(s): Safe Reinforcement Learning, Constrained Markov Decision Processes, Safe Policy Space, Violation Measure.

Abstract: As reinforcement learning (RL) expands into safety-critical domains, ensuring agent adherence to safety constraints becomes crucial. This paper introduces a two-phase approach to safe RL, Violation-Guided Identification of Safety(ViGIS), which firstidentifies a safe policy space and then performs standard RL within this space. We present two variants: ViGIS-P, which precalculates the safe policy space given a known transition function, and ViGIS-L, which learns the safe policy space through exploration. We evaluate ViGIS in three environments: a multi-constraint taxi world, a deterministic bank robber game, and a continuous cart-pole problem. Results show that both variants significantly reduce constraint violations compared to standard and β-pessimistic Q-learning, sometimes at the cost of achieving a lower average reward. ViGIS-L consistently outperforms ViGIS-P in the taxi world, especially as constraints increase. In the bank robber environment, both achieve perfect safety. A Dee p Q-Network (DQN) implementation of ViGIS-L in the cart-pole domain reduces violations compared to a standard DQN. This research contributes to safe RL by providing a flexible framework for incorporating safety constraints into the RL process. The two-phase approach allows for clear separation between safety consideration and task optimization, potentially easing application in various safety-critical domains. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.146

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Westley, A. J., Rens and G. (2025). A Two-Phase Safe Reinforcement Learning Framework for Finding the Safe Policy Space. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5; ISSN 2184-433X, SciTePress, pages 275-285. DOI: 10.5220/0013151600003890

@conference{icaart25,
author={A. J. Westley and Gavin Rens},
title={A Two-Phase Safe Reinforcement Learning Framework for Finding the Safe Policy Space},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={275-285},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013151600003890},
isbn={978-989-758-737-5},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - A Two-Phase Safe Reinforcement Learning Framework for Finding the Safe Policy Space
SN - 978-989-758-737-5
IS - 2184-433X
AU - Westley, A.
AU - Rens, G.
PY - 2025
SP - 275
EP - 285
DO - 10.5220/0013151600003890
PB - SciTePress