Skip to content

feat(ci): add RunnerContext and RegressionError for experiment GH action#1635

Draft
wochinge wants to merge 1 commit intomainfrom
worktree-runner-context
Draft

feat(ci): add RunnerContext and RegressionError for experiment GH action#1635
wochinge wants to merge 1 commit intomainfrom
worktree-runner-context

Conversation

@wochinge
Copy link
Copy Markdown
Contributor

Summary

Adds the SDK-side primitives consumed by the upcoming langfuse/experiment-action GitHub Action (LFE-9241).

  • RunnerContext wraps Langfuse.run_experiment with action-injected defaults (data, dataset_version, name, run_name, metadata). Users can override any default on the call site; metadata is merged with user-supplied keys winning on collision. Supports both LocalExperimentItem and DatasetItem.
  • RegressionError lets a user's experiment function signal a CI gate failure. Exposes optional structured metric / value / threshold fields so the action can render a targeted callout in the PR comment; falls back to a free-form message or a default string.

Both live in a dedicated langfuse/ci.py module so the CI surface stays isolated from the general experiment API.

Usage (from the proposal)

from langfuse import RegressionError, RunnerContext
from langfuse.experiment import ExperimentResult

def experiment(context: RunnerContext) -> ExperimentResult:
    result = context.run_experiment(
        task=my_task,
        evaluators=[accuracy_evaluator],
        run_evaluators=[avg_accuracy],
    )
    if result.run_score_values["avg_accuracy"] < 0.9:
        raise RegressionError(
            result=result,
            metric="avg_accuracy",
            value=result.run_score_values["avg_accuracy"],
            threshold=0.9,
        )
    return result

Test plan

  • 15 new unit tests in tests/unit/test_ci.py covering:
    • Context defaults flowing through to Langfuse.run_experiment
    • Call-site overrides winning over context defaults
    • Metadata merge (user keys win on collision; both-None stays None)
    • LocalExperimentItem pass-through (both default and override paths)
    • ValueError when name / data are missing on both sides
    • RegressionError attributes, default message, structured message, user-message precedence
    • Signature-drift guard using inspect.signature that fails loudly if Langfuse.run_experiment grows a param not threaded through RunnerContext.run_experiment
  • python -c "from langfuse import RunnerContext, RegressionError" resolves

Out of scope

  • The langfuse/experiment-action GitHub Action itself
  • PR-comment rendering
  • Multi-metric regression reporting (deferred — can land non-breaking as regressions: List[Regression] later)

🤖 Generated with Claude Code

Adds the SDK-side primitives consumed by the upcoming
`langfuse/experiment-action` GitHub Action (LFE-9241):

- `RunnerContext` wraps `Langfuse.run_experiment` with action-injected
  defaults (data, dataset_version, name, run_name, metadata). Users can
  override any default on the call site; metadata is merged with
  user-supplied keys winning on collision.
- `RegressionError` lets users signal a CI gate failure and optionally
  pass structured `metric`/`value`/`threshold` fields so the action can
  render a callout in the PR comment.

Both live in a dedicated `langfuse/ci.py` module so the CI surface stays
isolated from the general experiment API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant