Reducing Time Steps Needed of Multi-Agent Reinforcement Methods in a Dynamic Navigation Environment
DOI:
https://doi.org/10.47611/jsrhs.v12i4.5683Keywords:
AIAbstract
In a simpler version of the delivery variation of the multi-agent pathfinding problem, we use a reinforcement learning approach instead of other common traditional algorithmic heuristics to tackle this NP-hard problem. The paper proposes using either fully decentralized or decentralized data and a centralized model to alleviate the problem of requiring large training time steps in multi-agent reinforcement learning by reducing the number of time steps needed, especially in an environment with cumulative rewards. These approaches don’t require the concatenation of local data among agents, and, as such, are relieved of the computational burden of increased data complexity. The proposed approaches were tested in an environment where multiple agents try to work together and clean the maximum number of total nonregenerative unclean tiles in a grid filled with obstacles in a set amount of time. In a final test on a randomized dynamic environment with varying grid sizes, one of our approaches proved very promising in application and scalability. This paper's most commonly used algorithm was proximal policy optimization, with brief mentions of deep q-network. Different implementations of these algorithms had their training tracked and recorded, including Stablebaseline3 and self-implementation using TensorFlow.
Downloads
References or Bibliography
Mahoney, Chris. “Reinforcement Learning.” Medium, 17 June 2021, towardsdatascience.com/reinforcement-learning-fda8ff535bb6#4b10.
Silver, David, et al. “A general reinforcement learning algorithm that Masters Chess, Shogi, and go through self-play.” Science, vol. 362, no. 6419, 2018, pp. 1140–1144, https://doi.org/10.1126/science.aar6404.
Ben Dickson, et al. “Ai Defeated Human Champions at Dota 2. Here’s What We Learned.” TechTalks, 23 Nov. 2019, bdtechtalks.com/2019/04/17/openai-five-neural-networks-dota-2/.
Foead, Daniel, et al. “A systematic literature review of a* pathfinding.” Procedia Computer Science, vol. 179, 2021, pp. 507–514, https://doi.org/10.1016/j.procs.2021.01.034.
Stern, Roni, et al. “Multi-agent pathfinding: Definitions, variants, and benchmarks.” Proceedings of the International Symposium on Combinatorial Search, vol. 10, no. 1, 2021, pp. 151–158, https://doi.org/10.1609/socs.v10i1.18510.
Queiroz, Ana Carolina, et al. “Solving multi-agent pickup and delivery problems using a genetic algorithm.” Intelligent Systems, 2020, pp. 140–153, https://doi.org/10.1007/978-3-030-61380-8_10.
Simonini, Thomas. “Proximal Policy Optimization (PPO).” Hugging Face – The AI Community Building the Future., huggingface.co/blog/deep-rl-ppo. Accessed 12 Aug. 2023.
Suran, Abhishek. “Proximal Policy Optimization (PPO) with Tensorflow 2.X.” Medium, 21 Sept. 2020, towardsdatascience.com/proximal-policy-optimization-ppo-with-tensorflow-2-x-89c9430ecc26.
Published
How to Cite
Issue
Section
Copyright (c) 2023 Ziduo Yi; Fateme Golivand, Brian Wescott
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.