Skip to content

HLR/Spatial-Modality-Switching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial Reasoning via Modality Switching Between Language and Symbolic Representations

This repository contains the code for our work on adaptive modality selection for spatial reasoning. The framework evaluates when a model should answer directly using language-based reasoning and when it should switch to a symbolic grid-based representation.

Repository Layout

.
├── common/
│   ├── llm.py               # vLLM and OpenAI-compatible clients
│   ├── io_utils.py          # input/output utilities
│   ├── relations.py         # relation extraction utilities
│   ├── metrics.py           # evaluation and switching-policy analysis
│   └── switching/           # switching metrics and routing logic
│       ├── config.py        # switching configuration
│       ├── complexity.py    # complexity estimation
│       ├── trust.py         # trustworthiness estimation
│       ├── shortcircuit.py  # short-circuit routing for efficiency
│       └── thresholds.py    # threshold selection
│
├── stepgame/
│   ├── switching/           # StepGame switching experiments
│   ├── grid_experiments/    # relation extraction and grid construction
│   └── shared/             
│
├── spartun/
│   ├── switching/           # SpaRTUN switching experiments
│   └── grid_experiments/    # grid construction, relation extraction, and QA runners
│
├── resq/
│   └── grid_experiments/    # ReSQ grid-based reasoning pipeline
│
├── requirements.txt
└── README.md

Each dataset folder contains its own README.md with dataset-specific commands, flags, and experiment details.

Datasets

The datasets are not redistributed in this repository. Please download them from the original releases and provide the corresponding paths to the runners using --data, --input, or the environment variables described below.

Dataset Source
StepGame correct_clean split from https://github.com/Fangjun-Li/SpatialLM-StepGame
SpaRTUN & ReSQ https://github.com/HLR/SpaRTUN

Installation

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration

The experiments use the following environment variables when applicable:

Variable Description
OPENAI_API_KEY API key for OpenAI model calls
VLLM_BASE_URL Local OpenAI-compatible vLLM endpoint
VLLM_MODEL Served vLLM model identifier

Running the Experiments

StepGame

First tune the switching thresholds on the validation split. This saves thresholds.json in the specified output directory.

python -m stepgame.switching.run_switching \
    --model qwen8b \
    --split val \
    --data /path/to/stepgame_reports.jsonl \
    --out-dir runs/qwen8b

Then evaluate on the test split using the validation thresholds. The test run uses the short-circuit cascade for efficiency.

python -m stepgame.switching.run_switching \
    --model qwen8b \
    --split test \
    --data /path/to/stepgame_reports.jsonl \
    --out-dir runs/qwen8b

Grid experiments build the grid and run the reasoning modes (relation extraction → grid → text/relations/grid answers):

VLLM_MODEL=<served-model> PYTHONPATH=stepgame/shared \
    python stepgame/grid_experiments/run_phase1.py samples.jsonl out.jsonl

SpaRTUN

SpaRTUN follows the same validation-then-test protocol.

python -m spartun.switching.runner \
    --input /path/to/spartun_switch_input.json \
    --split all \
    --out-dir runs/spartun

Grid experiments first extract relations from the stories, then answer with the pruned grid:

# 1) extract relations
python spartun/grid_experiments/relation_extraction/extract_relations_pipeline.py \
    --input stories.json --output relations.json --model <served-model>

# 2) grid QA
python spartun/grid_experiments/grid_qa_runners/run_pruned_grid.py \
    --model <served-model> --input relations.json --output preds.jsonl

ReSQ

ReSQ runs the grid-based reasoning pipeline directly (no threshold tuning). Set the served model and run:

VLLM_MODEL=<served-model> python resq/grid_experiments/resq_md.py

Grids are built with GPT by default; set RESQ_GRID_BACKEND=vllm to build them on the served model instead.

Citation

If you use this code, please cite our paper. Citation information will be added after the arXiv version is available.

About

Adaptive modality switching for textual spatial reasoning using text, relational facts, coordinates, and grid-based representations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages