VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

Benchmark for evaluating LLMs on corner case generation, code judgment, and debugging. This dataset was generated using GPT, Gemini, and Claude and should not be used to develop competing models.

📄 Paper: https://arxiv.org/abs/2603.15921

🤗 Dataset: https://huggingface.co/datasets/Salesforce/vibepass

Quick Start

# Install dependencies
pip install -r requirements.txt

# Set environment variables
cp .env.example .env
# Edit .env with your API keys

# Run evaluation
python src/eval.py \
  --input data/benchmark.jsonl \
  --output outputs/results.jsonl \
  --model sonnet4.5 \
  --task corner_case

Models

OpenAI: gpt-5-* (add _low, _medium, _high, _minimal for effort)
Anthropic: opus4.6, sonnet4.6, opus4.5, sonnet4.5, haiku4.5 (add _thinking)
Gemini: gemini-2.0-flash-exp, gemini-1.5-pro
Together AI: Various open-source models

Tasks

corner_case: Generate test cases that expose bugs in implementations
judge: Evaluate whether a solution is correct or buggy
debug: Fix buggy implementations

Arguments

--input FILE           # Input JSONL
--output FILE          # Output JSONL
--model MODEL          # Model name
--task TASK            # corner_case, judge, or debug
--lcb_data PATH        # LCB data (default: curation/data/lcb/test*.jsonl)
--timeout SECONDS      # Default: 60
--num_process_generate # Default: 16
--num_process_evaluate # Default: 4

Environment Variables

OPENAI_API_KEY=...
OPENAI_BASE_URL=...              # Optional
X_API_KEY=...                    # Optional gateway key
TOGETHER_API_KEY=...
GOOGLE_CLOUD_PROJECT=...
GOOGLE_CLOUD_LOCATION=global
SANDBOX_HOST=localhost
SANDBOX_PORT=8080

Input Format

{
  "coca_id": "id",
  "question_id": "platform_id",
  "question_content": "Problem...",
  "platform": "leetcode",
  "buggy_model_solution": "def solution(): ...",
  "test_checker": "def is_valid_test(): ...",
  "starter_code": "class Solution: ..."
}

Sandbox

Expects POST to http://localhost:8080/run_code:

{"code": "print('hello')", "language": "python", "run_timeout": 10}

Returns:

{"status": "Success", "run_result": {"status": "Finished", "stdout": "hello\n"}}

Structure

.
├── .env.example         # Configuration template
├── .gitignore          # Git ignore patterns
├── LICENSE             # MIT License
├── README.md           # This file
├── requirements.txt    # Dependencies
└── src/
    ├── eval.py         # Main evaluation script
    ├── llm_generator.py # LLM providers
    ├── utils.py        # Utilities
    └── prompts/        # Prompt templates
        ├── corner_case.py
        ├── judge.py
        ├── debug.py
        └── codegen.py

Citation

@misc{bansal2026vibepassvibecodersreally,
  title={VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?},
  author={Srijan Bansal and Jiao Fangkai and Yilun Zhou and Austin Xu and Shafiq Joty and Semih Yavuz},
  year={2026},
  eprint={2603.15921},
  archivePrefix={arXiv},
  primaryClass={cs.SE},
  url={https://arxiv.org/abs/2603.15921}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
how_to_license.md		how_to_license.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

Quick Start

Models

Tasks

Arguments

Environment Variables

Input Format

Sandbox

Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

Quick Start

Models

Tasks

Arguments

Environment Variables

Input Format

Sandbox

Structure

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages