Spatial Reasoning via Modality Switching Between Language and Symbolic Representations

This repository contains the code for our work on adaptive modality selection for spatial reasoning. The framework evaluates when a model should answer directly using language-based reasoning and when it should switch to a symbolic grid-based representation.

Repository Layout

.
├── common/
│   ├── llm.py               # vLLM and OpenAI-compatible clients
│   ├── io_utils.py          # input/output utilities
│   ├── relations.py         # relation extraction utilities
│   ├── metrics.py           # evaluation and switching-policy analysis
│   └── switching/           # switching metrics and routing logic
│       ├── config.py        # switching configuration
│       ├── complexity.py    # complexity estimation
│       ├── trust.py         # trustworthiness estimation
│       ├── shortcircuit.py  # short-circuit routing for efficiency
│       └── thresholds.py    # threshold selection
│
├── stepgame/
│   ├── switching/           # StepGame switching experiments
│   ├── grid_experiments/    # relation extraction and grid construction
│   └── shared/             
│
├── spartun/
│   ├── switching/           # SpaRTUN switching experiments
│   └── grid_experiments/    # grid construction, relation extraction, and QA runners
│
├── resq/
│   └── grid_experiments/    # ReSQ grid-based reasoning pipeline
│
├── requirements.txt
└── README.md

Each dataset folder contains its own README.md with dataset-specific commands, flags, and experiment details.

Datasets

The datasets are not redistributed in this repository. Please download them from the original releases and provide the corresponding paths to the runners using --data, --input, or the environment variables described below.

Dataset	Source
StepGame	`correct_clean` split from https://github.com/Fangjun-Li/SpatialLM-StepGame
SpaRTUN & ReSQ	https://github.com/HLR/SpaRTUN

Installation

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration

The experiments use the following environment variables when applicable:

Variable	Description
`OPENAI_API_KEY`	API key for OpenAI model calls
`VLLM_BASE_URL`	Local OpenAI-compatible vLLM endpoint
`VLLM_MODEL`	Served vLLM model identifier

Running the Experiments

StepGame

First tune the switching thresholds on the validation split. This saves thresholds.json in the specified output directory.

python -m stepgame.switching.run_switching \
    --model qwen8b \
    --split val \
    --data /path/to/stepgame_reports.jsonl \
    --out-dir runs/qwen8b

Then evaluate on the test split using the validation thresholds. The test run uses the short-circuit cascade for efficiency.

python -m stepgame.switching.run_switching \
    --model qwen8b \
    --split test \
    --data /path/to/stepgame_reports.jsonl \
    --out-dir runs/qwen8b

Grid experiments build the grid and run the reasoning modes (relation extraction → grid → text/relations/grid answers):

VLLM_MODEL=<served-model> PYTHONPATH=stepgame/shared \
    python stepgame/grid_experiments/run_phase1.py samples.jsonl out.jsonl

SpaRTUN

SpaRTUN follows the same validation-then-test protocol.

python -m spartun.switching.runner \
    --input /path/to/spartun_switch_input.json \
    --split all \
    --out-dir runs/spartun

Grid experiments first extract relations from the stories, then answer with the pruned grid:

# 1) extract relations
python spartun/grid_experiments/relation_extraction/extract_relations_pipeline.py \
    --input stories.json --output relations.json --model <served-model>

# 2) grid QA
python spartun/grid_experiments/grid_qa_runners/run_pruned_grid.py \
    --model <served-model> --input relations.json --output preds.jsonl

ReSQ

ReSQ runs the grid-based reasoning pipeline directly (no threshold tuning). Set the served model and run:

VLLM_MODEL=<served-model> python resq/grid_experiments/resq_md.py

Grids are built with GPT by default; set RESQ_GRID_BACKEND=vllm to build them on the served model instead.

Citation

If you use this code, please cite our paper. Citation information will be added after the arXiv version is available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial Reasoning via Modality Switching Between Language and Symbolic Representations

Repository Layout

Datasets

Installation

Configuration

Running the Experiments

StepGame

SpaRTUN

ReSQ

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
common		common
resq		resq
spartun		spartun
stepgame		stepgame
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Spatial Reasoning via Modality Switching Between Language and Symbolic Representations

Repository Layout

Datasets

Installation

Configuration

Running the Experiments

StepGame

SpaRTUN

ReSQ

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages