6 CONCLUSION AND FUTURE
WORK
An intrinsically motivated agent not only needs to
set interesting goals and be able to reach them but
should also decide whether to continue exploration
from the reached goal (‘post-exploration’). In this
work, we systematically investigated the benefit of
post-exploration in the general IMGEP framework
under different RL settings and tasks. Experiments
in several MiniGrid and Mujoco environments show
that post-exploration is not only beneficial in navi-
gation tasks under tabular settings but also can be
scaled up to more complex control tasks with neural
networks involved. According to our results, agents
with post-exploration gradually push the boundaries
of their known region outwards, which allows them
to reach a greater diversity of goals. Moreover, we re-
alize that ‘post-exploration’ is a very general idea and
is easy to be plugged into any IMGEP method. Re-
searchers should put more attention to this idea and
consider using it when possible.
The current paper studied post-exploration in a
simplified IMGEP framework, to better understand
its basic properties. In the future, it would be in-
teresting to plug post-exploration into other existing
IMGEP methods directly to show its benefits. More-
over, our current implementation uses random post-
exploration, which turned out to already work reason-
ably well. So an interesting direction for future work
is to post-explore in a smarter way, like using macro
actions or options. For example, in tasks where we
need to control a more complex agent such as an ant
or a humanoid robot, then random actions will never
help the agent stand up properly and it is even harder
to lead the agent step into new areas. Another promis-
ing future direction is to investigate the ‘adaptive’
post-exploration. Intuitively, post-exploration will be
most likely useful when it starts from a state that is
new or important enough and the agent should post-
explore more if the reached area is more naive, etc.
In short, the agent should adaptively decide when and
for how long to post-explore based on its own knowl-
edge boundary or properties of reached goals. Alto-
gether, post-exploration seems a promising direction
for future RL exploration research.
REFERENCES
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong,
R., Welinder, P., McGrew, B., Tobin, J., Pieter Abbeel,
O., and Zaremba, W. (2017). Hindsight experience re-
play. In Guyon, I., Luxburg, U. V., Bengio, S., Wal-
lach, H., Fergus, R., Vishwanathan, S., and Garnett, R.,
editors, Advances in Neural Information Processing Sys-
tems, volume 30. Curran Associates, Inc.
Campero, A., Raileanu, R., Kuttler, H., Tenenbaum, J. B.,
Rockt
¨
aschel, T., and Grefenstette, E. (2021). Learning
with amigo: Adversarially motivated intrinsic goals. In
International Conference on Learning Representations.
Chevalier-Boisvert, M., Willems, L., and Pal, S. (2018).
Minimalistic gridworld environment for openai gym.
https://github.com/maximecb/gym-minigrid.
Colas, C., Fournier, P., Chetouani, M., Sigaud, O., and
Oudeyer, P.-Y. (2019). Curious: intrinsically motivated
modular multi-goal reinforcement learning. In Inter-
national conference on machine learning, pages 1331–
1340. PMLR.
Colas, C., Karch, T., Sigaud, O., and Oudeyer, P.-Y.
(2020). Intrinsically motivated goal-conditioned rein-
forcement learning: a short survey. arXiv preprint
arXiv:2012.09830.
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., and
Clune, J. (2021). First return, then explore. Nature,
590(7847):580–586.
Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. (2018).
Diversity is all you need: Learning diverse skills without
a reward function.
Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018).
Automatic goal generation for reinforcement learning
agents. In Dy, J. and Krause, A., editors, Proceedings
of the 35th International Conference on Machine Learn-
ing, volume 80 of Proceedings of Machine Learning Re-
search, pages 1515–1528. PMLR.
kngwyu (2021). mujoco-maze. https://github.com/kngwyu/
mujoco-maze.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2016). Continuous
control with deep reinforcement learning. In Bengio, Y.
and LeCun, Y., editors, 4th International Conference on
Learning Representations, ICLR 2016, San Juan, Puerto
Rico, May 2-4, 2016, Conference Track Proceedings.
Pitis, S., Chan, H., Zhao, S., Stadie, B., and Ba, J. (2020).
Maximum entropy gain exploration for long horizon
multi-goal reinforcement learning. In International Con-
ference on Machine Learning, pages 7750–7761. PMLR.
Plappert, M., Andrychowicz, M., Ray, A., McGrew, B.,
Baker, B., Powell, G., Schneider, J., Tobin, J., Chociej,
M., Welinder, P., Kumar, V., and Zaremba, W. (2018).
Multi-goal reinforcement learning: Challenging robotics
environments and request for research.
Pong, V. H., Dalal, M., Lin, S., Nair, A., Bahl, S.,
and Levine, S. (2019). Skew-fit: State-covering self-
supervised reinforcement learning. arXiv preprint
arXiv:1903.03698.
Portelas, R., Colas, C., Hofmann, K., and Oudeyer, P.-Y.
(2020). Teacher algorithms for curriculum learning of
deep rl in continuously parameterized environments. In
Conference on Robot Learning, pages 835–853. PMLR.
First Go, then Post-Explore: The Benefits of Post-Exploration in Intrinsic Motivation
33