Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 17 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,27 +30,24 @@ Save this as `implement.sc` and run it with your task:
import orca.{*, given}

flow(OrcaArgs(args)):
// Plan persists to `.orca/plan-<hash>.md` so a re-run with the same
// prompt resumes from the first incomplete task. Plan.autonomous.from
// runs the planner as a single agentic turn (use Plan.interactive.from
// to let the planner ask clarifying questions). It returns the plan
// paired with the planner's session; `.value` keeps just the plan (the
// implementer below opens its own session). `recoverOrCreate` ensures the
// branch is checked out and the file is on disk before we start.
// Plan persists to `.orca/plan-<hash>.md` so a re-run with the same prompt
// resumes from the first incomplete task. Plan.autonomous.from runs the
// planner as one agentic turn (Plan.interactive.from lets it ask clarifying
// questions); `.value` keeps just the plan, dropping the planner's session.
// `recoverOrCreate` checks out the branch and writes the file before we start.
val planFile = Plan.defaultPath(userPrompt)
val plan = stage("Acquire plan"):
Plan.recoverOrCreate(planFile, "orca: starting work"):
Plan.autonomous.from(userPrompt, claude).value

// Stable session reused across every task so the implementer retains
// cross-task context. The planner's session isn't carried forward — it
// runs read-only and would inherit the restriction on resume.
// Stable session reused across tasks so the implementer retains context.
// The planner's isn't carried forward — it's read-only and would stay so
// on resume.
val session = claude.newSession

// Per task: implement, then review & fix. `implementTaskLoop` ticks
// the plan's checkbox + commits per task and removes the plan file at
// the end. The single commit captures the original implementation, the
// auto-formatted result, and any follow-up fixes the reviewers triggered.
// Per task: implement, then review & fix. `implementTaskLoop` ticks the
// checkbox, commits per task, and removes the plan file at the end. The one
// commit captures the implementation, formatting, and any reviewer fixes.
Plan.implementTaskLoop(planFile, plan): task =>
stage(s"Implement task: ${task.title}"):
stage("Implementation"):
Expand All @@ -59,16 +56,15 @@ flow(OrcaArgs(args)):
coder = claude,
sessionId = session,
reviewers = allReviewers(claude),
// Cheap model picks which reviewers run per task — sees each
// one's description plus the changed files. Swap for
// ReviewerSelector.allEveryRound to run every reviewer.
// Cheap model picks the per-task reviewers from their descriptions and
// the changed files. Swap for ReviewerSelector.allEveryRound to run all.
reviewerSelection = ReviewerSelector.llmDriven(claude.haiku),
task = task.title.value,
// Runs after the implementation and after each review fix, so the
// committed code is always formatted and reviewers skip style nits.
// Runs after every edit so commits stay formatted and reviewers skip
// style nits.
formatCommand = Some("sbt scalafmtAll"),
// A compile is a cheap sanity gate; correctness is the reviewers'
// and CI's job, so don't run the heavier full test suite here.
// Cheap sanity gate; correctness is the reviewers' and CI's job, so
// skip the heavier test suite.
lintCommand = Some("sbt Test/compile"),
lintLlm = Some(claude.haiku)
)
Expand Down
38 changes: 17 additions & 21 deletions examples/epic.sc
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,18 @@
*
* Two layers stack here:
*
* - **On-disk epic.** `.orca/plan-<hash>.md` holds the task list; on a
* fresh run the agent generates it, on a resume the existing file is
* recovered (pending edits stashed, branch re-attached) and execution
* restarts from the first incomplete task. Each task's `Status: [x]`
* checkbox is committed back to the plan file as the task lands, so a
* crash mid-flow loses no progress.
* - **Cross-agent review.** Claude implements; codex reviews. The
* implementing agent is its own worst critic — running reviewers on a
* separate model widens coverage without much extra cost. Fixes go back
* to the same Claude session. Both CLIs need to be logged in.
* - **On-disk epic.** `.orca/plan-<hash>.md` holds the task list — generated
* on a fresh run, recovered on a resume (pending edits stashed, branch
* re-attached) to restart from the first incomplete task. Each task's
* `Status: [x]` is committed as the task lands, so a crash loses no
* progress.
* - **Cross-agent review.** Claude implements; codex reviews — the
* implementer is its own worst critic, so a separate model widens coverage
* cheaply. Fixes go back to the same Claude session. Both CLIs must be
* logged in.
*
* At the end of a successful run the plan file is removed, then the
* documentation step updates the project README based on what changed.
* On success the plan file is removed, then a docs step updates the README
* based on what changed.
*
* Run it from a git repository, with `claude` and `codex` logged in:
*
Expand All @@ -41,15 +40,13 @@ flow(OrcaArgs(args)):
// below mints a fresh one.
Plan.autonomous.from(userPrompt, claude.opus).value

// Stable coder session reused across every task (and the docs pass at the
// end) so the agent retains cross-task context. Fresh session (not the
// planner's, which ran read-only). The runtime owns git commits — the agent
// is told not to commit by the default system prompt, so a stray `git
// commit` can't empty the working tree before `implementTaskLoop` commits.
// Stable coder session reused across every task (and the docs pass) so the
// agent retains context. Fresh — not the planner's (read-only). The runtime
// owns git: the default system prompt tells the agent not to commit, so a
// stray `git commit` can't empty the tree before `implementTaskLoop` does.
val session = claude.newSession

// Reviewers on codex (not claude — the implementer is its own worst critic);
// fixes go back to the same Claude session that implemented the task.
// Reviewers on codex; fixes go back to the Claude session that implemented.
val reviewers: List[LlmTool[?]] = allReviewers(codex)

Plan.implementTaskLoop(planFile, plan): task =>
Expand All @@ -63,8 +60,7 @@ flow(OrcaArgs(args)):
reviewers = reviewers,
reviewerSelection = ReviewerSelector.llmDriven(claude.haiku),
task = task.title.value,
// Format after every edit (the implementation and each review fix);
// Spotless is wired into the seed pom.
// Format after every edit; Spotless is wired into the seed pom.
formatCommand = Some("mvn -q spotless:apply")
)

Expand Down
21 changes: 10 additions & 11 deletions examples/implement-enhanced.sc
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,23 @@
*
* Same backbone as `implement.sc` (autonomous planning → persistent
* `.orca/plan-<hash>.md` → per-task implement + review-and-fix loop), with two
* extra steps chained onto planning — both running on the planner's read-only
* session, so they cost no extra codebase exploration:
* steps chained onto planning — both on the planner's read-only session, so
* they cost no extra exploration:
*
* 1. **`.reviewed(claude)`** — the planner critiques its own draft and
* returns an improved plan (missing/duplicated tasks, ordering, vague
* descriptions, steps that don't fit the code).
* 1. **`.briefed(claude)`** — the planner writes a one-off codebase brief
* (modules, file paths, key APIs, conventions) and attaches it, producing
* a `PlanWithBrief`. `plan.taskPrompt(task)` prepends the brief to every
* task, so the coding agents — which start cold — don't re-discover what
* the planner already learned. The brief excludes the task prompts.
* (modules, paths, key APIs, conventions), producing a `PlanWithBrief`.
* `plan.taskPrompt(task)` prepends it to every task so the cold-starting
* coding agents don't re-discover what the planner already learned.
*
* The brief rides in the single plan file (a trailing `## Brief` section), so
* `recoverOrCreate` / `implementTaskLoop` persist it on a fresh run, reuse it
* on resume, and remove it with the file when the plan completes — no sidecar.
* The brief rides in the plan file (a trailing `## Brief` section), so
* `recoverOrCreate` / `implementTaskLoop` persist, reuse, and remove it with
* the file — no sidecar.
*
* Swap the order to `.briefed(claude).reviewed(claude)` to also review the
* brief; both are well-typed.
* Swap to `.briefed(claude).reviewed(claude)` to also review the brief; both
* are well-typed.
*
* ```bash
* scala-cli run implement-enhanced.sc -- "Add a multiply function to the calculator crate"
Expand Down
30 changes: 14 additions & 16 deletions examples/implement-interactive.sc
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,20 @@

/** Interactive planning + coding flow (persistent).
*
* Same shape as `implement.sc` but the planner opens a conversation the user
* can drive: if the prompt is underspecified, the agent calls the `ask_user`
* tool to clarify before producing the plan. The resulting plan is persisted
* to `.orca/plan-<hash>.md` so a re-run resumes from the first incomplete
* task.
* Same shape as `implement.sc`, but the planner can drive a conversation: on an
* underspecified prompt it calls the `ask_user` tool to clarify before
* producing the plan. The plan persists to `.orca/plan-<hash>.md` so a re-run
* resumes from the first incomplete task.
*
* `examples/runnable/02-interactive/create-test-project.sh` seeds the calculator
* crate into a temp directory and copies this script alongside it; run
* from the seeded directory the seeder prints:
* crate into a temp dir and copies this script alongside it; from there:
*
* ```bash
* scala-cli run implement-interactive.sc -- "Add a new arithmetic operation to the calculator crate. Ask the user which."
* ```
*
* The trailing "Ask the user which." pushes the planner to call `ask_user`
* rather than guessing which operation to add.
* rather than guessing.
*
* Requires `claude` logged in and `cargo` on PATH.
*/
Expand All @@ -32,12 +30,12 @@ flow(OrcaArgs(args)):
// (the planner can call `ask_user` to clarify) and branch.
val plan = stage("Acquire plan"):
Plan.recoverOrCreate(planFile):
// `.value` drops the planner's sessionthe implementer below mints a
// fresh one (ask_user was only needed for planning).
// `.value` drops the planner's session; the implementer mints its own
// (ask_user was only needed for planning).
Plan.interactive.from(userPrompt, claude).value

// Stable autonomous session reused across every task — ask_user was only
// needed for planning. Implementer and fixer share it.
// Stable autonomous session shared by implementer and fixer (ask_user was
// only needed for planning).
val session = claude.newSession

Plan.implementTaskLoop(planFile, plan): task =>
Expand All @@ -53,11 +51,11 @@ flow(OrcaArgs(args)):
// `ReviewerSelector.allEveryRound` to run every reviewer.
reviewerSelection = ReviewerSelector.llmDriven(claude.haiku),
task = task.title.value,
// Format after every edit (the implementation and each review fix), so
// the committed code stays formatted and reviewers skip style nits.
// Format after every edit so commits stay formatted and reviewers
// skip style nits.
formatCommand = Some("cargo fmt"),
// A compile is a cheap sanity gate for the reviewers; correctness is
// the reviewers' and CI's job, so don't run the (much heavier) tests.
// Cheap sanity gate; correctness is the reviewers' and CI's job, so
// skip the heavier tests.
lintCommand = Some("cargo check --tests"),
lintLlm = Some(claude.haiku)
)
39 changes: 18 additions & 21 deletions examples/implement.sc
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,24 @@

/** Persistent planning + coding flow (autonomous planning).
*
* Mirrors the README example. The agent breaks the user's prompt into a list
* of tasks; the plan is persisted to `.orca/plan-<hash>.md` so a re-run with
* the same prompt resumes from the first incomplete task. Each task is
* implemented in sequence on a single epic branch with a review-and-fix loop,
* the plan's checkbox is ticked, and the work plus the tick are committed
* together. When every task is done the plan file is removed and the removal
* is committed.
* Mirrors the README example. The agent breaks the prompt into tasks, persisted
* to `.orca/plan-<hash>.md` so a re-run with the same prompt resumes from the
* first incomplete task. Each task is implemented on a single epic branch with
* a review-and-fix loop; the work and the ticked checkbox are committed
* together. When every task is done the plan file is removed and that removal
* committed.
*
* `examples/runnable/01-simple/create-test-project.sh` seeds the calculator crate
* into a temp directory and copies this script alongside it; run from
* the seeded directory the seeder prints:
* `examples/runnable/01-simple/create-test-project.sh` seeds the calculator
* crate into a temp dir and copies this script alongside it; from there:
*
* ```bash
* scala-cli run implement.sc -- "Add a multiply function to the calculator crate"
* ```
*
* Requires `claude` logged in and `cargo` on PATH.
*
* For the variant where the planner can ask the user clarifying questions
* (open-ended prompts, underspecified asks), see `implement-interactive.sc`.
* For the variant where the planner can ask clarifying questions, see
* `implement-interactive.sc`.
*/

import orca.{*, given}
Expand All @@ -33,13 +31,12 @@ flow(OrcaArgs(args)):
// Resume `.orca/plan-<hash>.md` if it exists; otherwise plan + branch.
val plan = stage("Acquire plan"):
Plan.recoverOrCreate(planFile):
// `.value` drops the planner's read-only session the implementer
// below mints a fresh one.
// `.value` drops the planner's read-only session; the implementer
// mints its own.
Plan.autonomous.from(userPrompt, claude).value

// Stable session reused across every task — implementer and fixer share
// it so review comments land against the same context that produced the
// code. Fresh session (not the planner's, which was in plan mode).
// Stable session shared by implementer and fixer, so reviews land against
// the code's own context. Fresh — not the planner's (plan mode).
val session = claude.newSession

Plan.implementTaskLoop(planFile, plan): task =>
Expand All @@ -55,11 +52,11 @@ flow(OrcaArgs(args)):
// `ReviewerSelector.allEveryRound` to run every reviewer.
reviewerSelection = ReviewerSelector.llmDriven(claude.haiku),
task = task.title.value,
// Format after every edit (the implementation and each review fix), so
// the committed code stays formatted and reviewers skip style nits.
// Format after every edit so commits stay formatted and reviewers
// skip style nits.
formatCommand = Some("cargo fmt"),
// A compile is a cheap sanity gate for the reviewers; correctness is
// the reviewers' and CI's job, so don't run the (much heavier) tests.
// Cheap sanity gate; correctness is the reviewers' and CI's job, so
// skip the heavier tests.
lintCommand = Some("cargo check --tests"),
lintLlm = Some(claude.haiku)
)
Loading
Loading