Skip to content
45 changes: 31 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,11 @@ The following are available inside a `flow(...) { ... }`:

| Tool | Methods | Purpose |
|---|---|---|
| `claude` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `haiku`/`sonnet`/`opus`/`fable`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withSelfManagedGit` | Claude Code coding/reviewing agent. Bare `claude` is **Opus with the 1M-token context window** (the long-lived implementer needs it; reviewers share it); use `claude.sonnet` / `claude.haiku` for cheap one-shot calls (reviewer picker, lint, PR summariser), or `claude.fable` for the most capable tier on the hardest one-shots. `interactive` mode lives only on `resultAs[O]`. |
| `codex` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `mini`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withSelfManagedGit` | OpenAI Codex coding/reviewing agent. |
| `opencode` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `anthropicOpus`/`anthropicSonnet`/`anthropicHaiku`, `openaiGpt5`/`openaiGpt5Codex`/`openaiGpt5Mini`, `withModel(providerModel)` / `withModel(provider, modelId)`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withSelfManagedGit` | [OpenCode](https://opencode.ai) coding/reviewing agent, driven over HTTP+SSE against a headless `opencode serve` (started lazily, shared for the run). Spans providers, so models are provider-qualified: use an accessor (`opencode.openaiGpt5Mini`) or `opencode.withModel("openai/gpt-4o-mini")` / `opencode.withModel("ollama", "llama3.1")`. Inherits the user's configured `opencode` providers/auth. |
| `pi` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withSelfManagedGit` | [Pi](https://pi.dev/) coding agent backend, driven through `pi --mode rpc`. Pi handles provider/model selection through its own CLI configuration; pin a model with `pi.withConfig(LlmConfig(model = Some(Model("provider/model"))))`. Interactive calls can ask clarifying questions via Orca's `ask_user` bridge. |
| `gemini` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `flash`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withSelfManagedGit` | Google Gemini CLI coding/reviewing agent, driven via `gemini --output-format stream-json`. Bare `gemini` pins **Gemini 2.5 Pro**; use `gemini.flash` for cheaper one-shot calls. Structured output is prompt-enforced (Gemini has no schema flag); `withReadOnly` maps to `--approval-mode plan`. See [ADR 0015](adr/0015-gemini-stream-json-driver.md). |
| `claude` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `haiku`/`sonnet`/`opus`/`fable`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withNetworkOnly`, `withNetworkTools`, `withSelfManagedGit` | Claude Code coding/reviewing agent. Bare `claude` is **Opus with the 1M-token context window** (the long-lived implementer needs it; reviewers share it); use `claude.sonnet` / `claude.haiku` for cheap one-shot calls (reviewer picker, lint, PR summariser), or `claude.fable` for the most capable tier on the hardest one-shots. `interactive` mode lives only on `resultAs[O]`. |
| `codex` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `mini`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withNetworkOnly`, `withSelfManagedGit` | OpenAI Codex coding/reviewing agent. |
| `opencode` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `anthropicOpus`/`anthropicSonnet`/`anthropicHaiku`, `openaiGpt5`/`openaiGpt5Codex`/`openaiGpt5Mini`, `withModel(providerModel)` / `withModel(provider, modelId)`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withNetworkOnly`, `withSelfManagedGit` | [OpenCode](https://opencode.ai) coding/reviewing agent, driven over HTTP+SSE against a headless `opencode serve` (started lazily, shared for the run). Spans providers, so models are provider-qualified: use an accessor (`opencode.openaiGpt5Mini`) or `opencode.withModel("openai/gpt-4o-mini")` / `opencode.withModel("ollama", "llama3.1")`. Inherits the user's configured `opencode` providers/auth. |
| `pi` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withNetworkOnly`, `withSelfManagedGit` | [Pi](https://pi.dev/) coding agent backend, driven through `pi --mode rpc`. Pi handles provider/model selection through its own CLI configuration; pin a model with `pi.withConfig(LlmConfig(model = Some(Model("provider/model"))))`. Interactive calls can ask clarifying questions via Orca's `ask_user` bridge. |
| `gemini` | `autonomous.run(prompt, session?)`, `resultAs[O].{autonomous,interactive}.run(input, session?)`, `newSession`, `flash`, `withConfig`, `withSystemPrompt`, `withName`, `withReadOnly`, `withNetworkOnly`, `withSelfManagedGit` | Google Gemini CLI coding/reviewing agent, driven via `gemini --output-format stream-json`. Bare `gemini` pins **Gemini 2.5 Pro**; use `gemini.flash` for cheaper one-shot calls. Structured output is prompt-enforced (Gemini has no schema flag); `withReadOnly` maps to `--approval-mode plan`. See [ADR 0015](adr/0015-gemini-stream-json-driver.md). |
| `git` | `createBranch`, `checkout`, `checkoutOrCreate`, `ensureClean`, `commit`, `push`, `currentBranch`, `diff`, `log`, `addWorktree`, `removeWorktree`, `listWorktrees` | Git operations against the working tree. Recoverable failures (`BranchAlreadyExists`, `BranchNotFound`, `NothingToCommit`, `PushRejected`, `WorktreeAddFailed`, `WorktreeNotFound`) surface as `Either`; `.orThrow` converts a `Left` back to an exception when the case is unexpected. |
| `gh` | `createPr`, `updatePr`, `readIssue`, `readIssueComments`, `readPrComments`, `writeComment(pr, body)` / `writeComment(issue, body)`, `buildStatus`, `waitForBuild` | GitHub PR + CI integration via the `gh` CLI. `createPr` returns `Either[PrCreateFailed, …]` (covers `PrAlreadyExists` / `NoCommitsToPr`); `updatePr` replaces a PR's title + body (refresh a tentative description once the fix lands); `waitForBuild` returns `Either[BuildWaitFailed, …]`. |
| `fs` | `read`, `write`, `list` | Working-tree file I/O. `read` returns `Option[String]` so a missing file is a branch point, not an exception. |
Expand Down Expand Up @@ -141,16 +141,33 @@ flow(OrcaArgs(args)):
## Coding agent tools

> [!WARNING]
> **Coding agent tool usage is auto-approved by default** (`autoApprove = AutoApprove.All`):
> write-capable turns let the agent edit files and run shell commands without
> prompting. Constrain this in code, or isolate the whole run in a sandbox.
> **Coding agent tool usage is auto-approved by default** (`tools =
> ToolSet.Full`, `autoApprove = AutoApprove.All`): write-capable turns let the
> agent edit files and run shell commands without prompting. Constrain this in
> code, or isolate the whole run in a sandbox.

Tighten approval per tool with `withReadOnly` / `withConfig`:
Two axes constrain an agent. **Capability** (`LlmConfig.tools: ToolSet`) is
which tools exist at all:

```scala
// Read-only: no writes, no edits, no side-effecting shell (planning, review).
val planner = claude.withReadOnly
// ReadOnly — reads only, no shell, no edits (reviewers, plan review, brief).
val reviewer = claude.withReadOnly

// NetworkOnly — reads plus read-only network (web + GitHub), for planners that
// must read an issue/PR. Hard no-edit on claude (command-scoped `--allowedTools`,
// configurable via `claude.withNetworkTools(...)`), gemini (scoped `web_fetch`),
// and opencode (web); on pi/codex network needs a writable shell, so there the
// no-edit guarantee is prompt-only. See ADR 0016.
val planner = claude.withNetworkOnly

// Full (the default) — write-capable.
```

**Prompting** (`autoApprove`) is which of the available tools auto-approve
without a y/n prompt — only meaningful for interactive turns, and consulted
only on `Full`:

```scala
// Restrict auto-approval to a named tool set (honoured by claude/codex).
val limited = claude.withConfig(
LlmConfig(autoApprove = AutoApprove.Only(Set("Read", "Edit", "Grep")))
Expand Down Expand Up @@ -182,16 +199,16 @@ orthogonal — every combination is valid. Mode is picked at the call site
(`Plan.autonomous.*` vs `Plan.interactive.*`), mirroring how `LlmTool` itself
splits `autonomous` / `interactive`:

| Operation | Result | `autonomous` (read-only, no human) | `interactive` (agent can `ask_user`) |
| Operation | Result | `autonomous` (read-only + network, no human) | `interactive` (agent can `ask_user`) |
|---|---|---|---|
| `from(userPrompt, llm, instructions?)` | `Plan` | plan in one agentic turn | drive the planner conversationally |
| `assessThenPlan(userPrompt, llm, instructions?)` | `Verdict[Plan]` | assess, then `Proceed(plan)` or `Rejection(kind, body)` | same, but can ask the reporter to clarify instead of rejecting |
| `triage(report, llm, instructions?)` | `Triage` | classify a bug report (not-a-bug / untestable / testable) | same, with clarifying questions |

Every cell returns `Sessioned[B, <result>]` — the result paired with the agent
session that produced it. Continue that session into implementation
(`llm.autonomous.run(task, sessioned.sessionId)` — the read-only planning turn's
session is still resumable with write access), or `.value` it and mint a fresh
(`llm.autonomous.run(task, sessioned.sessionId)` — the planning turn's session
is still resumable with write access), or `.value` it and mint a fresh
session via `llm.newSession`. Destructure positionally when you want both:
`val Sessioned(session, plan) = Plan.autonomous.from(...)`.

Expand Down
73 changes: 73 additions & 0 deletions adr/0016-toolset-capability-axis-and-planner-network.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# 0016. `ToolSet` capability axis and planner read-only network access

Status: Accepted · Date: 2026-06-11
Related: [ADR 0003](0003-pluggable-llm-backends.md) (backend surface), [ADR 0011](0011-reviewer-roster.md) (reviewers run read-only)

## Context

Planning turns run autonomously (stdin closed, no `ask_user` MCP) and
read-only. On every backend the read-only mode also blocks the network the
planner needs to read an issue/PR it was pointed at: claude's
`--permission-mode plan` prompts for `WebFetch`/`gh` (and an autonomous turn
can't answer), codex's `--sandbox read-only` blocks all network, pi's read-only
`--tools` has no web tool, gemini's `--approval-mode plan` gates web/shell.

Capability was previously encoded as a boolean `LlmConfig.readOnly` layered over
the `AutoApprove` enum, munged together in each backend's args mapping. Two
problems: (1) the boolean couldn't express "read-only **plus** network"; (2)
`withReadOnly` is the shared hard no-edit gate for *seven* turn kinds (two
planners, plan-review, brief, triage, code reviewers, reviewer-selection /
lint-summary), and `Reviewers.scala` relies on it so reviewers can't edit
mid-review — so network must not be tied to "read-only" in general.

## Decision

Replace `readOnly: Boolean` with a capability enum on `LlmConfig`:

```scala
enum ToolSet: case ReadOnly, NetworkOnly, Full
```

`ToolSet` is the **capability axis** (which tools exist); `AutoApprove` stays
the orthogonal **prompting axis** (which available tools auto-approve, only
meaningful interactively and consulted only on `Full`). Only the two autonomous
planner entry points (`Plan.autonomousResult` → `from`/`assessThenPlan`/`triage`)
select `NetworkOnly`; reviewers, `reviewed`/`briefed`, selection and lint keep
`ReadOnly`, hard everywhere.

### Per-backend `NetworkOnly` mapping

| Backend | `NetworkOnly` | No-edit guarantee | Network |
| --- | --- | --- | --- |
| claude | `plan` + `--allowedTools <networkTools>` | **hard** (command-scoped allowlist; plan mode blocks general bash + edits) | web + scoped `gh` |
| pi | `--tools …,bash` | **prompt-only** (bash permits writes) | shell (`gh`/`curl`) |
| codex | `--full-auto` + `-c sandbox_workspace_write.network_access=true` | **prompt-only** (workspace-write permits writes) | shell + web |
| gemini | `--approval-mode plan --allowed-tools web_fetch` | hard | web |
| opencode | write tools disabled (= `ReadOnly`) | hard | web only, server-dependent |

pi and codex have no read-only-with-network mode, so granting network forces a
writable surface; there the no-edit guarantee rests on the planner prompts
(`planning.md` / `assess-then-plan.md` / `triage.md` all forbid edits), not the
sandbox. **Verified** on the gemini CLI: plain `plan` mode blocks `web_fetch`,
but `plan` + `--allowed-tools web_fetch` runs it (returns content), so gemini
keeps its hard no-edit guarantee *and* gets web reads (no shell `gh`).
`--allowed-tools` is deprecated (gemini 1.0 → Policy Engine); migrate then.
opencode keeps `bash` off (no writable-shell network); its web tool isn't in the
disabled set, so web may work (server-dependent, unverified).

### Claude allowlist placement

The claude network allowlist (`--allowedTools` strings like `Bash(gh api:*)`) is
claude-specific, so it lives on `ClaudeBackend` (default
`DefaultNetworkTools`), not the shared `LlmConfig`. It is configurable per flow
via `claude.withNetworkTools(...)`. The default includes `Bash(gh api:*)` for
broad GitHub reads — note `gh api -X POST` can mutate GitHub (not local files);
flows wanting a tighter set override it.

## Consequences

- Claude planners get scoped read-only network with the hard no-edit guarantee
intact; pi/codex planners get network with a prompt-only guarantee;
gemini/opencode planners stay network-free and rely on pre-fetching.
- `withReadOnly` semantics are unchanged for the six non-planner turn kinds.
- `AutoApprove.Only` remains unused by flows (latent); not removed here.
66 changes: 44 additions & 22 deletions claude/src/main/scala/orca/tools/claude/ClaudeArgs.scala
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package orca.tools.claude

import orca.backend.{CliArgs, Dispatch}
import orca.llm.{AutoApprove, BackendTag, LlmConfig, SessionId}
import orca.llm.{AutoApprove, BackendTag, LlmConfig, SessionId, ToolSet}

/** Maps LlmConfig fields to Claude Code CLI flags. `systemPrompt` is consumed
* by the backend (written to a file whose path is passed in via
Expand All @@ -27,7 +27,8 @@ private[claude] object ClaudeArgs:
systemPromptFile: Option[os.Path],
dispatch: Dispatch[BackendTag.ClaudeCode.type],
jsonSchema: Option[String] = None,
mcpConfig: Option[os.Path] = None
mcpConfig: Option[os.Path] = None,
networkTools: Seq[String] = Seq.empty
): Seq[String] =
Seq(
"claude",
Expand All @@ -42,7 +43,7 @@ private[claude] object ClaudeArgs:
CliArgs.modelArgs(config) ++
systemPromptFileArgs(systemPromptFile) ++
sessionArgs(dispatch) ++
autoApproveArgs(config) ++
autoApproveArgs(config, networkTools) ++
jsonSchemaArgs(jsonSchema) ++
mcpConfigArgs(mcpConfig)

Expand Down Expand Up @@ -71,23 +72,44 @@ private[claude] object ClaudeArgs:
private def mcpConfigArgs(file: Option[os.Path]): Seq[String] =
file.toSeq.flatMap(f => Seq("--mcp-config", f.toString))

/** `readOnly` overrides any `autoApprove` setting: claude's
* `--permission-mode plan` makes Edit/Write/Bash unavailable to the agent
* (not just non-auto-approved). The planner's "don't edit files" instruction
* in the prompt is advisory; this turns it into a hard guarantee.
/** Maps [[LlmConfig.tools]] to claude's permission flags. Both read-only
* tiers use `--permission-mode plan`, which makes Edit/Write/Bash
* unavailable (not just non-auto-approved) — turning the planner's advisory
* "don't edit" prompt into a hard guarantee. `Full` follows
* [[LlmConfig.autoApprove]].
*
* `NetworkOnly` additionally pre-approves `networkTools` via
* `--allowedTools`, layering read-only network access (web + scoped `gh`)
* onto plan mode so an autonomous planner can fetch issues/PRs without a
* permission prompt it can't answer. The list is command-scoped, so plan
* mode still hard-blocks general bash and every edit. An empty list leaves
* plain plan mode.
*/
private def autoApproveArgs(config: LlmConfig): Seq[String] =
if config.readOnly then Seq("--permission-mode", "plan")
else
config.autoApprove match
case AutoApprove.All =>
Seq("--permission-mode", "bypassPermissions")
case AutoApprove.Only(tools) if tools.isEmpty =>
Seq("--permission-mode", "acceptEdits")
case AutoApprove.Only(tools) =>
Seq(
"--permission-mode",
"acceptEdits",
"--allowedTools",
tools.toSeq.sorted.mkString(",")
)
private def autoApproveArgs(
config: LlmConfig,
networkTools: Seq[String]
): Seq[String] =
config.tools match
case ToolSet.ReadOnly => Seq("--permission-mode", "plan")
case ToolSet.NetworkOnly if networkTools.isEmpty =>
Seq("--permission-mode", "plan")
case ToolSet.NetworkOnly =>
Seq(
"--permission-mode",
"plan",
"--allowedTools",
networkTools.mkString(",")
)
case ToolSet.Full =>
config.autoApprove match
case AutoApprove.All =>
Seq("--permission-mode", "bypassPermissions")
case AutoApprove.Only(tools) if tools.isEmpty =>
Seq("--permission-mode", "acceptEdits")
case AutoApprove.Only(tools) =>
Seq(
"--permission-mode",
"acceptEdits",
"--allowedTools",
tools.toSeq.sorted.mkString(",")
)
Loading
Loading