Add local Apptainer SWE-bench image builds#745
Draft
neubig wants to merge 1 commit into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
swebench-infer --workspace apptainerprefer the registry image when present, then fall back to building a local sandbox from the official SWE-bench image and the checked-out SDK submoduleWhy
Some SWE-bench agent-server tags are not available in the registry. Docker mode can build them locally, but Apptainer mode previously failed immediately when the registry tag was missing. This gives HPC users a Dockerless path that mirrors the Docker source-minimal agent-server layer closely enough to run the SDK agent server inside an Apptainer sandbox.
The local builder currently supports the
source-minimaltarget only. That matches the default SWE-bench inference target and keeps the change scoped.Verification
python -m py_compile benchmarks/swebench/apptainer_build.py benchmarks/swebench/run_infer.py tests/test_swebench_apptainer_build.pyuv run ruff check benchmarks/swebench/apptainer_build.py benchmarks/swebench/run_infer.py tests/test_swebench_apptainer_build.pyuv run ruff format --check benchmarks/swebench/apptainer_build.py benchmarks/swebench/run_infer.py tests/test_swebench_apptainer_build.py/home/gneubig/work/openhands-benchmarks-venv/bin/python -m pytest -q tests/test_swebench_apptainer_build.pyastropy__astropy-12907fromdocker.io/swebench/sweb.eval.x86_64.astropy_1776_astropy-12907:latest; the correspondingghcr.io/openhands/eval-agent-server:...astropy_1776_astropy-12907-source-minimaltag was unavailable, so this exercises the fallback path.ApptainerWorkspace;/healthreturned 200,/testbedexisted,python --versioninside/testbedreturnedPython 3.11.5, and/agent-server/.venv/bin/python -c 'import openhands.agent_server'succeeded.python -m benchmarks.swebench.run_infer ... --workspace apptainer --select selected_astropy_12907_apptainer_build_smoke.txt --max-iterations 1 --num-workers 1 --note apptainer_build_fake_smoke. It exited 0 and wrote one output row. The instance log shows the missing registry image, reuse of the local Apptainer sandbox, andApptainer workspace is ready.Note: the checkpoint-backed vLLM smoke could not be re-run in this shell because the job has
ulimit -m=16384000KB and the vLLM server was OOM-killed before serving. The smoke above verifies this PR's Apptainer build/use path independently of model serving.Issue
Closes #746
This PR description update was created by an AI agent (Codex) on behalf of Graham Neubig.