diff --git a/src/docs.json b/src/docs.json index 7ff2a54abb..0155319352 100644 --- a/src/docs.json +++ b/src/docs.json @@ -1654,7 +1654,7 @@ "langsmith/sandbox-permissions", "langsmith/sandbox-cli", "langsmith/sandbox-sdk", - "langsmith/sandbox-harbor" + "langsmith/harbor-integrations" ] }, { @@ -2252,6 +2252,10 @@ } }, "redirects": [ + { + "source": "/langsmith/sandbox-harbor", + "destination": "/langsmith/harbor-integrations#sandboxes" + }, { "source": "/build-overview", "destination": "/oss/python/build-overview" diff --git a/src/langsmith/harbor-integrations.mdx b/src/langsmith/harbor-integrations.mdx index 173d41be73..10136f3e3b 100644 --- a/src/langsmith/harbor-integrations.mdx +++ b/src/langsmith/harbor-integrations.mdx @@ -281,7 +281,6 @@ harbor run -d "" \ ## See also -- [Run evaluations with Harbor](/langsmith/sandbox-harbor) - [Deep Agents documentation](/oss/deepagents/overview) - [Datasets & Experiments](/langsmith/manage-datasets) - [Analyze an experiment](/langsmith/analyze-an-experiment) diff --git a/src/langsmith/sandbox-harbor.mdx b/src/langsmith/sandbox-harbor.mdx deleted file mode 100644 index ab827d9df3..0000000000 --- a/src/langsmith/sandbox-harbor.mdx +++ /dev/null @@ -1,85 +0,0 @@ ---- -title: Run evaluations with Harbor -sidebarTitle: Harbor -description: Run Harbor evaluations and rollouts on LangSmith sandboxes with the harbor[langsmith] extra. ---- - -[Harbor](https://harborframework.com/docs) is a framework for evaluating and optimizing agents and language models in sandboxed environments, from the creators of [Terminal-Bench](https://www.tbench.ai). Harbor runs each trial in an isolated container, so you can parallelize evaluations and rollouts across many environments at once. - -The `langsmith` Harbor environment runs those trials on LangSmith sandboxes. Select it with `-e langsmith` to execute Harbor jobs on LangSmith infrastructure, alongside providers such as AgentCore, Daytona, E2B, and Modal. - -## Prerequisites - -- A [LangSmith account](https://smith.langchain.com) and an API key. -- Python with `pip`. - -## Install - -Install Harbor with the `langsmith` extra: - -```bash -pip install "harbor[langsmith]" -``` - -## Authenticate - -Harbor authenticates with your LangSmith credentials. Set an API key: - -```bash -export LANGSMITH_API_KEY="" -``` - -`LANGCHAIN_API_KEY` works as well. Alternatively, select a [LangSmith SDK profile](/langsmith/profile-configuration) instead of exporting a key: - -```bash -export LANGSMITH_PROFILE=prod -``` - -## Run an evaluation - -Run a Harbor job and select the LangSmith environment with `-e langsmith`: - -```bash -harbor run -d "" \ - -m "" \ - -a "" \ - -e langsmith \ - -n "" -``` - -Harbor creates one LangSmith sandbox per trial, runs the agent and verifier inside it, then tears the sandbox down when the trial finishes. - -## Configure the sandbox environment - -The LangSmith environment boots each sandbox from a filesystem snapshot. Provide one of the following in your Harbor task: - -- **Prebuilt image**: set `[environment].docker_image` in `task.toml`. Harbor reuses or creates a snapshot from that image. -- **Existing snapshot**: pass `environment.kwargs.snapshot_name` to boot from a [snapshot](/langsmith/sandbox-snapshots) you already created. -- **Dockerfile**: include an `environment/Dockerfile`. Harbor builds a snapshot from it with the [build-from-Dockerfile flow](/langsmith/sandbox-snapshots#build-a-snapshot-from-a-dockerfile), using the task `environment/` directory as the build context. - -Tune the sandbox lifecycle with environment kwargs, passed on the command line with `--ek`: - -```bash -harbor run -d "" \ - -m "" \ - -a "" \ - -e langsmith \ - -n "" \ - --ek idle_ttl_seconds=0 \ - --ek delete_after_stop_seconds=7200 -``` - -- `idle_ttl_seconds`: stops an idle sandbox after this many seconds. Set `0` to disable the idle timeout. -- `delete_after_stop_seconds`: deletes a stopped sandbox after this many seconds. - -## Run Deep Agents on LangSmith - -[Deep Agents](/oss/deepagents/overview) run on LangSmith sandboxes through Harbor's built-in `langgraph` agent and the `-e langsmith` environment. The full setup: staging the Deep Agents packages, the LangGraph project (`langgraph.json` with the `deepagent` graph), and the exact `harbor run` command, is maintained in the Deep Agents eval guide: - -→ [Running Deep Agents on Harbor / Terminal Bench 2.0](https://github.com/langchain-ai/deepagents/blob/main/libs/evals/CONTRIBUTING.md#harbor--terminal-bench-20) - -The [LangSmith-specific options](#configure-the-sandbox-environment) (authentication, `-e langsmith`, and the `--ek` sandbox-lifecycle kwargs) apply to those runs as well. - -## Multi-container tasks - -The LangSmith environment supports multi-container tasks. Include an `environment/docker-compose.yaml` file in your task definition to run several containers per trial. See the [Harbor sandbox documentation](https://harborframework.com/docs/run-jobs/cloud-sandboxes) for details. diff --git a/src/langsmith/sandboxes.mdx b/src/langsmith/sandboxes.mdx index d756dfa3fb..a86fb2c73b 100644 --- a/src/langsmith/sandboxes.mdx +++ b/src/langsmith/sandboxes.mdx @@ -115,7 +115,7 @@ To wire sandboxes into agent code, see the Open Source docs: Create and manage sandboxes programmatically with the Python or TypeScript SDK. - + Run Harbor evaluations and rollouts on LangSmith sandboxes.