LLMs and Spatial Reasoning: Assessing Roadblocks and Providing Pathways to Improvement

Authors

  • William Peng Lumiere Education
  • Sam Powers Lumiere Research Scholar Program

DOI:

https://doi.org/10.47611/jsrhs.v13i2.6812

Keywords:

Artificial Intelligence, Spatial Reasoning, Computer Science

Abstract

In this paper, we show how relative location prompting can improve how Large Language Models (LLMs) such as ChatGPT interact with Spatial Reasoning (SR) tasks. LLMs likely have difficulty with SR tasks due to being designed for language based tasks, whilst SR tasks are more visual (Lee, 2023). By reading papers on Self-Ask (Press et al., 2023) and Chain-of-thought (Jason Zhanshun Wei et al., 2022) and recognizing their success, we hypothesized that prompting techniques similar to the ones mentioned would be effective in increasing the success-rate of the LLM agent in our Spatial Reasoning task. Taking these two factors into account, the solution we came up with was to turn the multi-step interaction-based SR tasks into more simple tasks by prompting the AI agent with its relative location to the target location after each step taken. We set up a 2D 5x5 grid world environment to test the LLM agent against, and by setting up a separate environment which includes relative location prompting, as well as a random environment, we saw the difference between the three success-rates. We collected and analysed data of 300 trials total (100 trials on each of the 3 environments) to conclude that relative location prompting does improve the success rate of LLMs when tackling SR tasks. This showed that by converting SR tasks into text, and by breaking down large tasks into smaller tasks, AI can solve SR problems better. Future studies should investigate other types of SR tasks, such as folding scenarios, and test out different prompting methods to determine the best one.

Downloads

Download data is not yet available.

References or Bibliography

Dilek, E., & Dener, M. (2023). Computer Vision Applications in Intelligent Transportation Systems: A Survey. Sensors (Basel, Switzerland), 23(6), 2938. https://doi.org/10.3390/s23062938

Jason Zhanshun Wei, Wang, X., Schuurmans, D., Bosma, M., Chi, E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://doi.org/10.48550/arxiv.2201.11903

Kikot, S. (2023). Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2308.01085

Kim, H., Koh, Y., Baek, J., & Kang, J. (2021). Exploring the spatial reasoning ability of neural models in human IQ tests. Neural Networks. https://doi.org/10.1016/j.neunet.2021.02.018

Lee, A. (2023, January 26). What Are Large Language Models and Why Are They Important? NVIDIA Blog. https://blogs.nvidia.com/blog/what-are-large-language-models-used-for/#:~:text=A%20large%20language%20model%2C%20or

OpenAI. (2024). GPT-3.5 [Large Language Model]. Chat.openai.com; OpenAI. https://chat.openai.com/chat

Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A., & Lewis, M. (2023, May 22). Measuring and Narrowing the Compositionality Gap in Language Models. ArXiv.org. https://doi.org/10.48550/arXiv.2210.03350

reCAPTCHA: Easy on Humans, Hard on Bots. (n.d.). Www.google.com. Retrieved February 28, 2024, from https://www.google.com/recaptcha/intro/?zbcode=inc5000#:~:text=reCAPTCHA%20makes%20positive%20use%20of

Shridhar, M., Manuelli, L., & Fox, D. (2022, November 11). Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation. ArXiv.org. https://doi.org/10.48550/arXiv.2209.05451

Published

05-31-2024

How to Cite

Peng, W., & Powers, S. (2024). LLMs and Spatial Reasoning: Assessing Roadblocks and Providing Pathways to Improvement. Journal of Student Research, 13(2). https://doi.org/10.47611/jsrhs.v13i2.6812

Issue

Section

HS Research Projects