LLMs and Spatial Reasoning: Assessing Roadblocks and Providing Pathways to Improvement
DOI:
https://doi.org/10.47611/jsrhs.v13i2.6812Keywords:
Artificial Intelligence, Spatial Reasoning, Computer ScienceAbstract
In this paper, we show how relative location prompting can improve how Large Language Models (LLMs) such as ChatGPT interact with Spatial Reasoning (SR) tasks. LLMs likely have difficulty with SR tasks due to being designed for language based tasks, whilst SR tasks are more visual (Lee, 2023). By reading papers on Self-Ask (Press et al., 2023) and Chain-of-thought (Jason Zhanshun Wei et al., 2022) and recognizing their success, we hypothesized that prompting techniques similar to the ones mentioned would be effective in increasing the success-rate of the LLM agent in our Spatial Reasoning task. Taking these two factors into account, the solution we came up with was to turn the multi-step interaction-based SR tasks into more simple tasks by prompting the AI agent with its relative location to the target location after each step taken. We set up a 2D 5x5 grid world environment to test the LLM agent against, and by setting up a separate environment which includes relative location prompting, as well as a random environment, we saw the difference between the three success-rates. We collected and analysed data of 300 trials total (100 trials on each of the 3 environments) to conclude that relative location prompting does improve the success rate of LLMs when tackling SR tasks. This showed that by converting SR tasks into text, and by breaking down large tasks into smaller tasks, AI can solve SR problems better. Future studies should investigate other types of SR tasks, such as folding scenarios, and test out different prompting methods to determine the best one.
Downloads
References or Bibliography
Dilek, E., & Dener, M. (2023). Computer Vision Applications in Intelligent Transportation Systems: A Survey. Sensors (Basel, Switzerland), 23(6), 2938. https://doi.org/10.3390/s23062938
Jason Zhanshun Wei, Wang, X., Schuurmans, D., Bosma, M., Chi, E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://doi.org/10.48550/arxiv.2201.11903
Kikot, S. (2023). Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2308.01085
Kim, H., Koh, Y., Baek, J., & Kang, J. (2021). Exploring the spatial reasoning ability of neural models in human IQ tests. Neural Networks. https://doi.org/10.1016/j.neunet.2021.02.018
Lee, A. (2023, January 26). What Are Large Language Models and Why Are They Important? NVIDIA Blog. https://blogs.nvidia.com/blog/what-are-large-language-models-used-for/#:~:text=A%20large%20language%20model%2C%20or
OpenAI. (2024). GPT-3.5 [Large Language Model]. Chat.openai.com; OpenAI. https://chat.openai.com/chat
Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A., & Lewis, M. (2023, May 22). Measuring and Narrowing the Compositionality Gap in Language Models. ArXiv.org. https://doi.org/10.48550/arXiv.2210.03350
reCAPTCHA: Easy on Humans, Hard on Bots. (n.d.). Www.google.com. Retrieved February 28, 2024, from https://www.google.com/recaptcha/intro/?zbcode=inc5000#:~:text=reCAPTCHA%20makes%20positive%20use%20of
Shridhar, M., Manuelli, L., & Fox, D. (2022, November 11). Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation. ArXiv.org. https://doi.org/10.48550/arXiv.2209.05451
Published
How to Cite
Issue
Section
Copyright (c) 2024 William Peng; Sam Powers
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.