Skip to content

lunw1024/CS159-project

Repository files navigation

Abstract

This project investigates the intrinsic zero-shot capabilities of large language models (LLMs) to perform spatial reasoning tasks, inspired by the human ability to construct mental maps from textual descriptions alone. Despite the broad application of LLMs in tasks such as chatting, image generation, and code assistance, their potential in spatial reasoning remains less explored. Our work introduces a novel prompting method that breaks down complex spatial reasoning into smaller, manageable computations, demonstrating superior performance over existing methods like chain-of-thought and visualization-of-thought.

We focus on a natural language navigation task where a model navigates a spatial environment described purely in text. This approach is distinctive because it assesses the LLM's understanding of space based solely on its pre-training on text, without relying on visual or other multi-modal data. Our experiments not only showcase the effectiveness of internal representations of spatial information but also document various failure modes encountered, providing insights into the limitations and potential improvements for LLMs in spatial reasoning tasks.

Our findings suggest that by enhancing the internal representation methods and refining the prompting strategies, LLMs can achieve more accurate and reliable spatial reasoning, pushing the boundaries of what is achievable with purely text-based models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published