feat(ci): add RunnerContext and RegressionError for experiment GH action by wochinge · Pull Request #1635 · langfuse/langfuse-python

wochinge · 2026-04-20T15:13:50Z

Summary

Adds the SDK-side primitives consumed by the upcoming langfuse/experiment-action GitHub Action (LFE-9241).

RunnerContext wraps Langfuse.run_experiment with action-injected defaults (data, dataset_version, name, run_name, metadata). Users can override any default on the call site; metadata is merged with user-supplied keys winning on collision. Supports both LocalExperimentItem and DatasetItem.
RegressionError lets a user's experiment function signal a CI gate failure. Exposes optional structured metric / value / threshold fields so the action can render a targeted callout in the PR comment; falls back to a free-form message or a default string.

Both live in a dedicated langfuse/ci.py module so the CI surface stays isolated from the general experiment API.

Usage (from the proposal)

from langfuse import RegressionError, RunnerContext
from langfuse.experiment import ExperimentResult

def experiment(context: RunnerContext) -> ExperimentResult:
    result = context.run_experiment(
        task=my_task,
        evaluators=[accuracy_evaluator],
        run_evaluators=[avg_accuracy],
    )
    if result.run_score_values["avg_accuracy"] < 0.9:
        raise RegressionError(
            result=result,
            metric="avg_accuracy",
            value=result.run_score_values["avg_accuracy"],
            threshold=0.9,
        )
    return result

Test plan

15 new unit tests in tests/unit/test_ci.py covering:
- Context defaults flowing through to Langfuse.run_experiment
- Call-site overrides winning over context defaults
- Metadata merge (user keys win on collision; both-None stays None)
- LocalExperimentItem pass-through (both default and override paths)
- ValueError when name / data are missing on both sides
- RegressionError attributes, default message, structured message, user-message precedence
- Signature-drift guard using inspect.signature that fails loudly if Langfuse.run_experiment grows a param not threaded through RunnerContext.run_experiment
python -c "from langfuse import RunnerContext, RegressionError" resolves

Out of scope

The langfuse/experiment-action GitHub Action itself
PR-comment rendering
Multi-metric regression reporting (deferred — can land non-breaking as regressions: List[Regression] later)

🤖 Generated with Claude Code

Adds the SDK-side primitives consumed by the upcoming `langfuse/experiment-action` GitHub Action (LFE-9241): - `RunnerContext` wraps `Langfuse.run_experiment` with action-injected defaults (data, dataset_version, name, run_name, metadata). Users can override any default on the call site; metadata is merged with user-supplied keys winning on collision. - `RegressionError` lets users signal a CI gate failure and optionally pass structured `metric`/`value`/`threshold` fields so the action can render a callout in the PR comment. Both live in a dedicated `langfuse/ci.py` module so the CI surface stays isolated from the general experiment API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add RunnerContext and RegressionError for experiment GH action#1635

feat(ci): add RunnerContext and RegressionError for experiment GH action#1635
wochinge wants to merge 1 commit intomainfrom
worktree-runner-context

wochinge commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wochinge commented Apr 20, 2026

Summary

Usage (from the proposal)

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant