Skip to content

doc: add design spec for azd ai agent routine commands#8200

Open
huimiu wants to merge 7 commits into
mainfrom
huimiu/hui-ai-routine-design-spec
Open

doc: add design spec for azd ai agent routine commands#8200
huimiu wants to merge 7 commits into
mainfrom
huimiu/hui-ai-routine-design-spec

Conversation

@huimiu
Copy link
Copy Markdown
Member

@huimiu huimiu commented May 15, 2026

Summary

Adds the design spec for the new routine command subtree (cli/azd/docs/design/ai-routine-design-spec.md). A routine pairs one trigger (when) with one action (what) on a Foundry project, for example "every weekday at 8 AM UTC, invoke the daily report agent", without standing up Logic Apps, Functions, or cron infra.

Related issues:

Key points

  • Placement: lives in the existing azure.ai.agents extension. Today the commands surface as azd ai agent routine …; they become azd ai routine … after the planned extension split (registration only). No azure.yaml or registry.json impact, no new persistent state.
  • v1 surface: create, update, show, list, delete, enable, disable, dispatch, and run list. run show and run delete are deferred until the data plane lands them.
  • Endpoint resolution: reuses the standard 5 level cascade and works standalone outside an azd project, matching connection, toolbox, and skill. The Routines=V1Preview opt in header is sent on every call.
  • Create vs. update: the data plane is a single idempotent PUT; the CLI splits it for usability. create fails on existing (or upserts with --force). update is GET then PUT preserving untouched fields, with type switching of --trigger / --action rejected client side.
  • enable / disable / dispatch: dedicated idempotent verbs hide the wire format. dispatch is sync by default and streams the agent response; --async switches to the async route.
  • Wire mapping: kebab case CLI aliases (recurring, agent-response, agent-invoke) read naturally and absorb upstream API renames via a single mapping table.
  • Telemetry and tests: one event per command (no PII), plus unit coverage for flag to wire mapping, update merge semantics, and dispatch route selection, with an E2E smoke covering the create through delete loop.

@huimiu huimiu marked this pull request as ready for review May 15, 2026 07:02
Copilot AI review requested due to automatic review settings May 15, 2026 07:02
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @rajeshkamal5050 or @kristenwomack if you'd like to discuss prioritization.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design specification for the proposed azd ai agent routine command subtree in the azure.ai.agents extension, covering command behavior, endpoint resolution, wire mappings, telemetry, and tests.

Changes:

  • Adds the routine command design spec and v1 command surface.
  • Documents endpoint resolution, API routes, output shapes, telemetry, and test plan.
  • Adds a cspell override for terminology in the new design doc.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 8 comments.

File Description
cli/azd/docs/design/ai-routine-design-spec.md New design spec for routine commands.
cli/azd/.vscode/cspell.yaml Adds a file-specific spelling override for the new doc.

Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md
Comment thread cli/azd/docs/design/ai-routine-design-spec.md
Comment thread cli/azd/docs/design/ai-routine-design-spec.md
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md
Comment thread cli/azd/docs/design/ai-routine-design-spec.md
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Copy link
Copy Markdown
Contributor

@lindazqli lindazqli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed against TypeSpec PR #43186 (adding routines, now merged into feature/foundry-release). Six issues found — four are blockers before shipping v1:

  1. [Line 56] TypeSpec PR reference is stale (#42779 → should be #43186).
  2. [Line 165] enable/disable should call the dedicated :enable / :disable routes that already exist in the TypeSpec, not GET-then-PUT.
  3. [Line 173] No sync :dispatch route exists in TypeSpec — only POST :dispatch_async. The "sync by default" contract has no API backing.
  4. [Line 249] --agent-id / agent_id should be --agent-name / agent_name to match the TypeSpec field name.
  5. [Line 201] --orderby has no orderBy query param in ListRoutineRunsParameters.
  6. [Line 239] GitHub trigger: --owner serializes to owner but TypeSpec field is assignee; --event-action has no TypeSpec counterpart.

Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
Copy link
Copy Markdown
Contributor

@lindazqli lindazqli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: the previous review was submitted as REQUEST_CHANGES by mistake — please treat all inline comments as informational notes for discussion, not as blockers. The comments remain valid observations to align the spec with TypeSpec PR #43186.

Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec follows the extension's existing patterns well for endpoint resolution, flag naming, and prompt/no-prompt behavior. The TypeSpec alignment feedback from existing reviews is the main thing to address before implementation.

One gap: error behavior isn't defined for any of the 9 commands. See inline comment.

Comment thread cli/azd/docs/design/ai-routine-design-spec.md Outdated
@huimiu
Copy link
Copy Markdown
Member Author

huimiu commented May 18, 2026

Thanks everyone for the careful review! Here's what changed to address all the feedback:

TypeSpec alignment (Linda's feedback against PR #43186)

  • Updated all stale #42779 refs to #43186.
  • enable / disable now call the dedicated :enable / :disable routes directly — no more GET-then-PUT (avoids the TOCTOU race you flagged).
  • dispatch (both default and --async) now hits :dispatch_async. There's no sync route in TypeSpec, so the flag just controls client-side waiting/streaming.
  • Renamed --agent-id--agent-name everywhere (spec body, command summary, telemetry) to match the agent_name TypeSpec field.
  • Dropped --orderby from v1 — no orderBy query param exists yet.
  • GitHub trigger: --owner--assignee, removed --event-action.
  • Added a heads-up note in §5.1 for the upcoming generic + strong-typed triggers.
  • Clarified --input is plain text (not parsed as JSON), and added an inline pointer that --trigger is an enum, not free-form.
  • Noted that the run show API now exists in #43186 — keeping registration deferred for v1 but called out in §4.8.

Error behavior + --file (John and Jon's feedback)

  • @jongio: added a new §4.9 Error Behavior section with a scenario → type/code/suggestion table covering 409 (create without --force), 404s (show/delete/dispatch/enable/disable on missing routine), the update GET-succeeds-then-PUT-404 race, type-switch, auth, endpoint resolution, and 5xx. Reuses the existing connection/toolbox typed-error pattern. Test plan updated with negative cases.
  • @therealjohn: pulled --file back into v1 scope. New §4.1.1 covers the file shape, mutual exclusion with --trigger, flag-level override semantics, and points to Support creating routines from a file #8187 for the schema details. update --file is supported too. Command summary, telemetry (hasFile), and test plan updated to match.

huimiu added a commit that referenced this pull request May 19, 2026
The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.
Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Design Spec Quality Pass

Thanks for the thorough iteration on this spec @huimiu — the prior round of feedback (TypeSpec alignment, error section, --file re-scoping) is well incorporated. The findings below are net-new spec-quality items intended to reduce implementation ambiguity. The PR is approvable as-is; these are suggestions to tighten the spec before it becomes the implementation source of truth.

Note on review state: Posting as COMMENT rather than REQUEST_CHANGES since @therealjohn has already approved and many items are clarifications, not blockers.


🔴 Critical — recommend addressing before merge or in a fast follow-up

  1. github-issue enum out of sync across three locations.

    • §1 summary (L68) and §4.1 enum (L105) advertise --trigger <recurring|timer|github-issue> as v1.
    • §5.1 trigger table marks github-issue as "Deferred — pending workspace connection model."
    • §9 reference summary correctly omits it.
    • Fix: Remove github-issue from §1 and §4.1 enum; align with §9.
  2. §4.9 error table lacks HTTP status codes.
    The Type/Code columns are documented but implementers can't map service responses without explicit status codes. Notably, CodeRoutineAlreadyExists (line ~276) doesn't state 409.
    Fix: Add an HTTP Status column (400/404/409/429/503 as applicable).

  3. §9 command summary misrepresents conditional requirements.
    Lines 508-514 show [--cron "0 8 * * *"] and [--at "..."] both as optional, but §5.1 makes each required per-trigger-type. The summary isn't copy-pasteable.
    Fix: Split into separate recurring/timer examples, or add inline comments noting "--cron for recurring, --at for timer."

  4. Orphan ### Output shapes for state-changing verbs heading (between §4.8 and §4.9).
    Unnumbered; §8 references it informally as [§4 table].
    Fix: Number it (e.g., §4.8.1) and update the §8 anchor.

  5. dispatch streaming/failure contract under-specified (§4.6).

    • Streaming protocol unspecified (SSE? chunked? line-buffered?).
    • JSON output shape ambiguous when streaming aggregates.
    • Polling timeout, max attempts, and reconnection semantics undefined.
    • Mid-stream failure: is partial output flushed? what exit code?
    • The leading GET /routines/{name} implementation note doesn't specify what happens if that GET fails.
      Fix: Add a dedicated subsection (or expand §4.6) covering the streaming contract and failure modes; reference an existing streaming command (e.g., agent invoke) if one exists.
  6. Broken/awkward anchor links across the doc.
    Several internal links break across line wraps or include double-dashes from pipe-to-anchor conversion:

    • L105-108: (#51-trigger\nr-flags…) — wrap artifact
    • L264-265: TypeSpec PR URL split with #\n#diff-…
    • L200-201, 207, 241, 288, 348, 391: double-dash anchor IDs from | chars in headings
      Fix: Rewrap URLs onto single lines; verify all (#…) anchors resolve against actual heading IDs.
  7. Exit codes not specified for any verb.
    Per AGENTS.md convention, errors should produce specific exit codes. Scripting consumers can't reliably distinguish validation errors from service errors.
    Fix: Add a §4.9.1 (or table column) listing exit codes per scenario.

  8. --file path resolution & validation under-specified.

    • Resolved against cwd or project root?
    • Path traversal/symlink/absolute-path handling?
    • Max file size?
    • §8 lacks negative tests for malformed YAML/JSON.
      Fix: Document path resolution rules in §4.1.1; add negative test cases in §8.

🟡 Important

  1. --cron dialect unspecified. Quartz vs Unix 5-field? Validated client-side or server-side? Error message format for invalid expressions? (§5.1)

  2. --at ISO 8601 edge cases. Past time → error or schedule next occurrence? "2026-04-24T15:00:00" without Z treated as local or UTC? DST transitions? (§5.1)

  3. Concurrency races undefined. What happens for: update during active dispatch? delete while routine is enabled? Concurrent enable/disable? At minimum, document expected server-side guarantees. (§4.2, §4.4, §4.5, §4.6)

  4. 429/503 retry semantics + network timeouts not covered in §4.9. Does the CLI retry with backoff, or surface immediately?

  5. 403 RBAC vs 401 auth conflated under CodeAuthFailed/CodeNotLoggedIn (L283). Different remediation paths — clarify mapping.

  6. create --force with conflicting immutable fields (different --trigger or --action) — replace silently, or fail? (§4.2)

  7. Telemetry "no PII" not defined (§6). Are agent names, conversation IDs, --assignee usernames, repository names captured? Enumerate explicitly.

  8. run list output formats (table/JSON) not specified (§4.7). Mirror routine list shape?

  9. Test plan gaps (§8): no explicit coverage for interactive prompts (AZD_FORCE_TTY=false), wire-mapping table per trigger×action, dispatch streaming/backpressure, all 9 telemetry events, --file negative cases, timer trigger in E2E smoke.

  10. --file + per-trigger override flags (§4.1.1 L134-142). Spec says --file mutually exclusive with --trigger but cooperates with --cron/--at — yet those are per-trigger-type. Clarify with valid/invalid examples.


🟢 Nice to have

  1. --async help text should explicitly say "returns immediately; use routine run list for status" — non-obvious otherwise. (§4.6)

  2. run subgroup contains only list in v1. Empty-group pattern — consider routine list-runs for v1 and defer the full run subgroup. (§1, §4.7-4.8)

  3. --input guidance. Recommend documenting a size cap and a "don't pass secrets via --input" note. (§4.6)

  4. TOCTOU undefined on first use (L191). Add "(Time-of-Check-Time-of-Use)".

  5. Endpoint cascade phrasing — described as "5-level" but step 5 is the structured error. Relabel as "4-level cascade with structured-error fallback." (§3 L77-82)

  6. Missing newline in §1 (L4): infra.Commands renders as one word.

  7. §9 reference summary inconsistency (L454): [--enabled=false] shows a literal value while other flags use placeholders.


Reviewed against TypeSpec PR #43186 references in the spec and existing patterns in cli/azd/extensions/azure.ai.agents/ and sibling design docs.

@JeffreyCA JeffreyCA added the ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions label May 20, 2026
trangevi pushed a commit that referenced this pull request May 26, 2026
* feat(routines): implement azd ai routine commands

Add the full v1 routine command subtree to the azure.ai.routines
extension as specified in the design spec (PR #8200).

Commands implemented:
- routine create, update, show, list, delete
- routine enable, disable (dedicated idempotent action routes)
- routine dispatch (calls dispatch_async, --async flag for client-side wait)
- routine run list (auto-paging, --top, --filter)

New packages:
- internal/exterrors/ -- structured error codes and helpers
- internal/pkg/routines/ -- data-plane HTTP client and models
- internal/cmd/endpoint.go -- 5-level project endpoint resolver

Wire format: trigger/action as Record with 'default' key.
All calls include x-ms-foundry-features-opt-in: Routines=V1Preview header.

Also adds the design spec at cli/azd/docs/design/ai-routine-design-spec.md.

* test(routines): add unit tests for endpoint, create, manifest, and models

- Add readAzdProjectSourcesFunc seam to endpoint.go for daemon isolation
- endpoint_test.go: isFoundryHost, validateProjectEndpoint, full cascade tests
- routine_create_test.go: buildTrigger and buildAction table tests
- routine_manifest_test.go: readRoutineManifest (JSON/YAML), mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction
- models_test.go: TriggerCLIToWire and ActionCLIToWire completeness
- Add yaml struct tags to models.go for YAML manifest support

* test(routines): align test patterns with azure.ai.agents extension

- Extract stubAzdProjectSources() helper (mirrors stubAzdHostedSources in agents)
- isolateFromAzdDaemon now also clears AZD_SERVER env var
- Add t.Parallel() to all pure-function tests (isFoundryHost, validateProjectEndpoint,
  buildTrigger, buildAction, mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction,
  TriggerCLIToWire/ActionCLIToWire map checks)

* fix(routines): address CI lint and spell-check failures

- cspell.yaml: add exterrors, sess, routineName, azdProjectSources to word list
- endpoint.go: remove unused projectEndpointPathPrefix constant
- routine_create.go: wrap long buildAction() call (line >125 chars)
- routine_update.go: wrap long --file flag help text
- routine_manifest_test.go: expand inline map literals to multi-line
- client.go: wrap ListRoutineRuns signature to fit 125-char limit
- Run gofmt -w and go fix on all files (codes.go, client.go, models.go formatting)

* fix(routines): remove unused ptrBool helper

golangci-lint flagged ptrBool as unused. The function had no call sites; the //go:fix inline directive does not exempt it from the unused linter.

* docs(routines): remove design spec from PR

The design spec is tracked separately in PR #8200; this PR focuses on the implementation only.

* fix(routines): close response bodies per page and preserve filter on pagination

- Extract getPage helper so resp.Body.Close runs per iteration in
  ListRoutines and ListRoutineRuns (defer-in-loop leaked FDs).
- Preserve the filter query param across pages in ListRoutineRuns;
  previously page 2+ only carried pageToken and dropped the filter.
- Correct dispatch command help text: the service always runs routines
  asynchronously, so the old 'waits and streams' wording was wrong.

* fix(routines): flatten command tree to remove duplicate 'routine routine'

The extension namespace 'ai.routine' already mounts the extension under
'azd ai routine'. Adding a 'routine' subcommand group on top of that
produced the redundant 'azd ai routine routine <cmd>' path.

Move --project-endpoint persistent flag and all subcommands directly onto
rootCmd so the correct usage is 'azd ai routine <cmd>'.

* fix(routines): align data-plane client with Foundry Routines TypeSpec

The first cut of the Routines client used header/route/field shapes that
did not match the TypeSpec being merged in azure-rest-api-specs#42779.
Realign the extension with the spec so requests round-trip cleanly:

* Preview header renamed from 'x-ms-foundry-features-opt-in' to
  'Foundry-Features' (the value 'Routines=V1Preview' was already correct).
* Async dispatch route renamed ':dispatch_async' -> ':dispatchAsync';
  the action segment is case-sensitive per spec.
* Dropped the non-existent ':enable' / ':disable' action routes;
  enable/disable now read the routine and PUT it back with 'enabled'
  flipped (idempotent: no-op if already at the target value).
* DispatchRoutineRequest wraps a discriminated 'payload' object whose
  'type' must match the routine's action type; --conversation-id was
  removed from dispatch (the spec does not expose it).
* Routine.Action is now a single discriminated object (not an 'actions'
  map keyed by name).
* RoutineAction.AgentName -> AgentID; the CLI flag is renamed to
  --agent-id accordingly.
* RoutineTrigger.Cron -> CronExpression to match the TypeSpec field.
* PagedRoutine pagination follows the absolute 'nextLink' URL from
  Azure.Core.Page<Routine> instead of re-deriving a continuation query.
* RoutineRun gains the additional fields documented in the spec
  (phase, trigger_type, attempt_source, action_type, triggered_at,
  dispatch_id, action_correlation_id, response_id, error_type,
  error_message); 'run list' now prints phase alongside status.
* EventRoutineTrigger fields aligned to the spec: connection_id, owner,
  repository, actions[]; removed 'assignee'.
* DispatchRoutineResponse drops the unused 'status' field.

Tests, mock manifests, and the E2E driver were updated to the new
contract (--agent-id, agent_id, cron_expression, single 'action').

Note: the live Foundry Routines preview endpoint still returns HTTP 500
on /routines?api-version=v1 even with the correct request shape; that is
an upstream service bug tracked separately.

* fix(routines): clarify comment for github_issue fields in RoutineTrigger

* fix(routines): address PR review feedback

- endpoint: reject project endpoints with an explicit port so the
  normalized URL cannot silently strip a non-default port
- routine create: only set Enabled from --enabled when the user
  explicitly passes the flag, so a manifest's enabled value is
  honored; default to enabled=true if neither source provides one
- routine create: explicitly reject --trigger github-issue (deferred
  for v1) instead of producing an incomplete github_issue trigger
- routine_helpers: boolStr now returns "unknown" for a nil pointer
  to avoid displaying "true" when the field is absent from the
  service response
- routine_manifest: surface applyUpdateFlags user-input errors as
  exterrors.Validation (CodeInvalidParameter) for consistent CLI
  error shapes

* chore(routines): add .golangci.yaml and AGENTS.md to align with sibling extensions

Other AI extensions (projects, agents, toolboxes, inspector) ship a
.golangci.yaml lint config and an AGENTS.md contributor guide. Add both
to azure.ai.routines so it follows the same convention, and register
the project-specific �xterrors word in cspell.yaml.

* fix(routines): use camelCase JSON tags to match Foundry service wire

The deployed Foundry Routines data plane applies a camelCase property
naming policy on the wire (e.g. `cronExpression`, `timeZone`,
`agentId`), even though the upstream TypeSpec / OpenAPI document still
emits snake_case. With snake_case JSON tags, `routine create` and
`update` always failed with errors like:

    triggers['default'].cronExpression must be provided for schedule routines
    exactly one of action.agentId or action.agentEndpointId must be provided

and routines read back from `show` / `list` would have empty
trigger/action fields because the camelCase wire payload did not
deserialize into snake_case-tagged Go fields.

Switch the JSON tags on `Routine`, `RoutineTrigger`, `RoutineAction`,
`RoutineRun`, `PagedRoutineRun`, and `DispatchRoutineResponse` to
camelCase so requests/responses round-trip cleanly against the deployed
service. YAML tags stay snake_case so user-facing `--file` manifests
keep the documented convention.

Verified against a live project endpoint: create/list/show now reach
the service correctly (residual `InternalServerError` from the
backend is unrelated and reproduces from raw curl with the same body).

* feat(routines): align with spec PR #43186 and fix HTTP/2 hang

Use azure-rest-api-specs PR #43186 (Foundry Routines TypeSpec) as the
single source of truth for the routines extension, applying every
spec change that does not break the currently deployed service, and
documenting each deliberate divergence inline and in AGENTS.md.

## Spec alignment

* `RoutineRun` and `DispatchRoutineResponse` gain the new `TaskID`
  field (wire `taskId`); the service already emits it. `dispatch`
  now prints `Task ID` after `Action Correlation ID`, and JSON
  output exposes the new field too.
* `RoutineTrigger` is restructured to match the spec's
  `GitHubIssueOpenedRoutineTrigger` shape: dropped `Owner` /
  `Actions[]`, added `Assignee`. The github trigger is still
  deferred at the CLI surface, so this is safe.
* Inline comments and a new AGENTS.md table call out each divergence
  the client deliberately keeps to stay compatible with the live
  service: camelCase wire naming (spec is snake_case), `agentId`
  field (spec renamed to `agent_name`), `:dispatchAsync` action
  segment (spec uses `:dispatch_async`), GET+PUT enable/disable
  fallback (spec adds dedicated routes which still 404),
  `value`/`nextLink` / `value`/`nextPageToken` paged shapes
  (spec uses `AgentsPagedResult<T>`), and `github_issue` wire
  value (spec renamed to `github_issue_opened`).

## CLI bug fix: HTTP/2 stream-reset hang

The pipeline now uses a custom `http.Client` with an explicit
`ResponseHeaderTimeout` (60s) and `TryTimeout` (30s), and azcore
retries are capped at 1. When the Foundry service returns an
HTTP/2 RST_STREAM (for example, the schedule-create
InternalServerError), the CLI now surfaces a `context deadline
exceeded` error within ~40 seconds instead of the previous ~6 minute
hang.

## Verified end-to-end against a live Foundry project

* timer create / show / list / update / disable / enable / dispatch
  (with `taskId` round-tripping) / run list / delete all succeed.
* schedule create still fails (service-side ISE) but now in under a
  minute instead of six.

* feat(routines): defer recurring/schedule trigger until service is ready

The Foundry data plane currently returns `InternalServerError` for any
`PUT /routines/{name}` request whose trigger is `schedule` (the wire
value behind the CLI's `--trigger recurring`). The CLI side is fully
implemented and verified correct via raw curl, so this is a service-side
issue, but it leaves the `recurring` trigger non-functional end-to-end.

Take the recurring trigger off the public CLI surface so users do not
hit the service hang:

* Drop the `--cron` flag from `routine create` and `routine update`.
* `--trigger recurring` is now rejected with the same "deferred"
  shape as `--trigger github-issue`: a clear error pointing the user
  at `--trigger timer` and explaining that recurring is gated on the
  Foundry service.
* `--trigger` help text and validation messages list only `timer`.

The underlying wire model still carries `cron_expression` /
`time_zone` and the `schedule` discriminator so re-enabling the
trigger when the service is ready is just a CLI flag-wiring change.
Unit tests around buildTrigger and applyUpdateFlags are updated
accordingly.

* fix(routines): enforce update-mode manifest merge, env-backed no-prompt in delete, and action-type flag validation

* fix(routines): replace unknown word 'misroute' in endpoint comment

* feat(routines): add exterrors unit tests and .gitignore for bin/

* style(routines): fix gofmt formatting in test files

* feat(routines): align client with spec PR #43186 routes and fields

The Foundry data plane now honors the routine spec from
azure-rest-api-specs#43186. Switch the client off the workarounds that the
first cut needed and onto the spec-shaped routes and wire format.

Tested by probing the live data plane on a Foundry project endpoint:

* Wire field naming: switch from camelCase to snake_case across Routine,
  RoutineTrigger, RoutineAction, RoutineRun, and DispatchRoutineResponse.
  Confirmed: service now rejects `agentId` / `agentName` (camel) with a
  400 `exactly one of agent_name or agent_endpoint_id must be provided`
  and only accepts `agent_name`.
* Enable / disable: switch from GET+PUT-with-enabled-flipped to the spec
  routes `POST /routines/{name}:enable` and `POST /routines/{name}:disable`.
  Confirmed: both routes return `UserError` / `NotFoundError` for missing
  routines (route exists; resource doesn't), instead of the empty 404 the
  routes used to return.
* Async dispatch: switch from `:dispatchAsync` (camelCase) to the spec route
  `:dispatch_async` (snake). Confirmed: the snake route is live; the camel
  form now returns an empty 404 (route gone).
* Schedule trigger: re-enable `--trigger recurring` / `--cron`. The
  original deferral was because every `schedule` PUT 500'd; with the spec
  wire format the schedule trigger passes service-side validation just like
  `timer` does. Re-add the `--cron` flag on `create` and `update`.

Kept divergent because the service has not caught up yet:

* `github_issue_opened` trigger value -- service still rejects it with
  `unrecognized type discriminator id`; CLI does not expose the github
  trigger yet, so the wire mapping keeps `github_issue`.
* `AgentsPagedResult<T>` envelope -- service still returns
  `value` + `nextLink` (routines) / `value` + `nextPageToken` (runs)
  rather than the spec's `data` / `last_id` / `has_more`.

Also:

* CLI flag `--agent-id` renamed to `--agent-name` to match the spec
  field name. Go field `RoutineAction.AgentID` renamed to `AgentName`.
* Drop now-stale `spec divergence` comments from the client, models, and
  AGENTS.md alignment table.

* fix(routines): address review feedback — op code, gRPC cancel, manifest errors, logging, docs

- routine_dispatch.go: use OpDispatchRoutine (not OpGetRoutine) when the
  inner GetRoutine call fails during dispatch validation
- exterrors/errors.go: IsCancellation now checks gRPC codes.Canceled in
  addition to context.Canceled, matching the agents implementation
- routine_manifest.go: distinguish os.IsNotExist from other os.ReadFile
  errors so permission-denied / is-a-directory get accurate messages
- client.go: add Logging.AllowedHeaders for MsCorrelationIdHeader to
  match agents observability parity
- AGENTS.md: rewrite exterrors section in present tense (package exists)
@microsoft-github-policy-service microsoft-github-policy-service Bot added no-recent-activity identity issues with no activity and removed no-recent-activity identity issues with no activity labels May 27, 2026
@kristenwomack
Copy link
Copy Markdown
Collaborator

Hi @huimiu,

Thanks for the iteration on this one. The error section and the :enable / :disable route fix look good.

Items to address:

First, the github-issue trigger says three different things in this spec. §1 and §4.1 list it as v1, §5.1 marks it deferred, and §9 leaves it out. Could you pick one and make the rest match? My vote is to drop it from §1 and §4.1, since the connection model isn't ready.

Second, dispatch default mode is the most user-facing flow in this whole surface, and the streaming behavior isn't written down. We don't say whether the stream is SSE or chunked, what the JSON output looks like when the stream aggregates, how long we poll before giving up, or what happens to partial output if something fails mid-stream. Could you add a subsection that spells that out? I don't want every integrator guessing.

Third, the run subgroup ships with only list in v1. Do we want azd ai agent routine list-runs until run show and run delete arrive, or are we OK with a subgroup that has one verb in it for now? Small call, but easier to make now than to rename later.

Fourth, --conversation-id on dispatch is preview, and it's not in TypeSpec yet. Marking it preview in --help is fine with me, and I want to be sure we're OK with shipping a flag the service might reject. Could you add a sentence on what the user will see if that happens?

Follow-ups, not blocking: @wbreza's review flags exit codes, the --cron dialect, --at without a timezone, 429/503 retry behavior, and 401 vs. 403. Could you open issues for the ones you don't fold in here so we don't lose them?

Thanks, Kristen

@kristenwomack
Copy link
Copy Markdown
Collaborator

Hey @huimiu, @therealjohn's already approved and @wbreza left a solid pass — I won't repeat any of that. Three things from me:

  1. github-issue trigger — the spec lists it in the v1 enum but the trigger table marks it deferred. Pick one. My vote is defer so v1 is just recurring + timer and we don't ship a half-wired trigger.
  2. run subgroup with only list — empty group feels off. @wbreza's routine list-runs suggestion is cleaner for v1. Let's go that route and add the run subgroup back when show/delete land.
  3. dispatch sync default — there's no sync route server-side, so "sync" is really a client-side poll. That's fine, but let's make sure the help text says so and we have a sane default timeout. What's the plan if the poll hangs?

Also +1 on @wbreza's call to nail down exit codes and the streaming contract before this becomes the implementation source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions

Projects

None yet

8 participants