Offline Model-Learning by Learning Legal Moves in Othello through Synthetic Negative Reinforcement
DOI:
https://doi.org/10.47611/jsrhs.v12i4.5638Keywords:
Offline Reinforcement Learning, Model-Based Reinforcement Learning, Negative Reinforcement, Artificial Intelligence, OthelloAbstract
One of the biggest dependencies of reinforcement learning is sample efficiency since reinforcement learning agents tend to use an excessively large number of episodes to train. These episodes can be expensive, especially when they are conducted in an online manner. Model-Based Reinforcement Learning (MBRL) and offline learning are two methods to aid this problem; MBRL has been shown to improve sample efficiency while offline learning can reduce the number of online episodes needed by substituting them with less expensive offline ones. However, the use of these two methods together is more challenging, encountering issues such as training off of incomplete distributions. We explore these challenges by testing different combinations of offline and online model-learning through learning the legal moves in the board game Othello. During this process, we encounter the additional challenge of only positive reinforcement where offline episodes only provide information about legal moves. To address this problem, we propose a method named synthetic negative reinforcement which uses pre-existing agent knowledge to make up for the lack of information on illegal moves. Our results demonstrate the efficacy of offline learning using synthetic negative reinforcement on robust distributions of offline data, with agents achieving greater than 97% accuracy predicting the legality of moves. We also demonstrate the evident obstacle that skewed distributions provide to offline model-learning.
Downloads
References or Bibliography
Agarwal, R., Kumar, A., Zhou, W., Rajeswaran, A., Tucker, G., & Precup, D. (2022, December 2). 3rd offline RL workshop: Offline RL as a "Launchpad." NeurIPS. Retrieved August 10, 2023, from https://offline-rl-neurips.github.io/2022/
Allis, V. L. (1994). Searching for solutions in games and artificial intelligence [Doctoral dissertation, University of Limburg]. François Grieu. http://fragrieu.free.fr/SearchingForSolutions.pdf
Buro, M. (2003). The evolution of strong Othello programs. Entertainment Computing. https://doi.org/10.1007/978-0-387-35660-0_65
Fédération Française d'Othello. (2023, April 26). La base WTHOR. Fédération Française d'Othello. Retrieved August 10, 2023, from https://www.ffothello.org/informatique/la-base-wthor/
He, Q., Liu, J. L., Eschapasse, L., Beveridge, E. H., & Brown, T. I. (2022). A comparison of reinforcement learning models of human spatial navigation. Scientific Reports, 12(1), 1-11. https://doi.org/10.1038/s41598-022-18245-1
István, M. (2017). How should I handle invalid actions (when using REINFORCE)? [Online forum post]. StackExchange. https://ai.stackexchange.com/questions/2980/how-should-i-handle-invalid-actions-when-using-reinforce
Kidambi, R., Rajeswaran, A., Netrapalli, P., & Joachims, T. (2020). MOReL: Model-based offline reinforcement learning. NeurIPS. https://doi.org/10.48550/arXiv.2005.05951
Kim, Y., Lee, H., & Ryu, J. (2022). Learning an accurate state transition dynamics model by fitting both a function and its derivative. IEEE Access, 10, 44248-44258. https://doi.org/10.1109/ACCESS.2022.3169798
LeCun, Y. (2022, June 27). A path towards autonomous machine intelligence. OpenReview. Retrieved August 8, 2023, from https://openreview.net/pdf?id=BZ5a1r-kVsf
Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020, May 4). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv. Retrieved August 1, 2023, from https://doi.org/10.48550/arXiv.2005.01643
Merkel, O. (2023). UCThello. GitHub. Retrieved August 8, 2023, from http://omerkel.github.io/UCThello/html5/src/
Quin, S., & Nicolet, S. (n.d.). Format de la base Wthor [Wthor database format]. Fédération Française d'Othello. Retrieved August 10, 2023, from https://www.ffothello.org/wthor/Format_WThor.pdf
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press. https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
van der Ree, M., & Wiering, M. (2013). Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play. 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. https://doi.org/10.1109/ADPRL.2013.6614996
Wang, Z., Fu, Q., Chen, J., Wang, Y., & Lu, Y. (2023). Reinforcement learning in few-shot scenarios: A survey [Abstract from ProQuest Databases]. Journal of Grid Computing, 21(2). https://doi.org/10.1007/s10723-023-09663-0
Xiang, S. (2021, December 19). Reinforcement learning Q-learning with illegal actions from scratch. Towards Data Science. Retrieved August 10, 2023, from https://towardsdatascience.com/reinforcement-learning-q-learning-with-illegal-actions-from-scratch-19759146c8bf
Published
How to Cite
Issue
Section
Copyright (c) 2023 Johnny Liu; Ganesh Mani
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.