Skip to content

Add local Apptainer SWE-bench image builds#745

Draft
neubig wants to merge 1 commit into
mainfrom
add-swebench-apptainer-build
Draft

Add local Apptainer SWE-bench image builds#745
neubig wants to merge 1 commit into
mainfrom
add-swebench-apptainer-build

Conversation

@neubig

@neubig neubig commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

  • add a local Apptainer sandbox builder for SWE-bench agent-server images
  • make swebench-infer --workspace apptainer prefer the registry image when present, then fall back to building a local sandbox from the official SWE-bench image and the checked-out SDK submodule
  • document the Docker prebuild path and the Dockerless local Apptainer build path
  • add focused tests for registry reuse, local sandbox fallback, and unsupported build targets

Why

Some SWE-bench agent-server tags are not available in the registry. Docker mode can build them locally, but Apptainer mode previously failed immediately when the registry tag was missing. This gives HPC users a Dockerless path that mirrors the Docker source-minimal agent-server layer closely enough to run the SDK agent server inside an Apptainer sandbox.

The local builder currently supports the source-minimal target only. That matches the default SWE-bench inference target and keeps the change scoped.

Verification

  • python -m py_compile benchmarks/swebench/apptainer_build.py benchmarks/swebench/run_infer.py tests/test_swebench_apptainer_build.py
  • uv run ruff check benchmarks/swebench/apptainer_build.py benchmarks/swebench/run_infer.py tests/test_swebench_apptainer_build.py
  • uv run ruff format --check benchmarks/swebench/apptainer_build.py benchmarks/swebench/run_infer.py tests/test_swebench_apptainer_build.py
  • /home/gneubig/work/openhands-benchmarks-venv/bin/python -m pytest -q tests/test_swebench_apptainer_build.py
  • Built a real local Apptainer sandbox for astropy__astropy-12907 from docker.io/swebench/sweb.eval.x86_64.astropy_1776_astropy-12907:latest; the corresponding ghcr.io/openhands/eval-agent-server:...astropy_1776_astropy-12907-source-minimal tag was unavailable, so this exercises the fallback path.
  • Started the resulting sandbox with ApptainerWorkspace; /health returned 200, /testbed existed, python --version inside /testbed returned Python 3.11.5, and /agent-server/.venv/bin/python -c 'import openhands.agent_server' succeeded.
  • Ran a one-instance CLI smoke with a local OpenAI-compatible fake endpoint: python -m benchmarks.swebench.run_infer ... --workspace apptainer --select selected_astropy_12907_apptainer_build_smoke.txt --max-iterations 1 --num-workers 1 --note apptainer_build_fake_smoke. It exited 0 and wrote one output row. The instance log shows the missing registry image, reuse of the local Apptainer sandbox, and Apptainer workspace is ready.

Note: the checkpoint-backed vLLM smoke could not be re-run in this shell because the job has ulimit -m=16384000 KB and the vLLM server was OOM-killed before serving. The smoke above verifies this PR's Apptainer build/use path independently of model serving.

Issue

Closes #746

This PR description update was created by an AI agent (Codex) on behalf of Graham Neubig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track PR #745: Add local Apptainer SWE-bench image builds

1 participant