diff --git a/AGENTS.md b/AGENTS.md index 64952a8..3b51d96 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -44,3 +44,18 @@ pnpm lint Write for developers discovering MartinLoop for the first time. Explain what the tool does, how to install it, how to run it, and how to verify results. + +## Proof Receipt Design Lock + +MartinLoop proof-card SVGs must stay in the CLI receipt style: + +- dark terminal canvas +- line-based layout +- monospaced evidence rows +- semantic green for verified/pass states +- semantic red for failed, missing, or boundary states + +Do not change proof receipts into rounded cards, blue palettes, gradients, +certificate layouts, dashboard cards, or decorative marketing graphics unless +the maintainer explicitly asks for that change and receives side-by-side +visual renders before approval. diff --git a/README.md b/README.md index 2cddba8..81a110c 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,7 @@ npx -y martin-loop@latest preflight "Summarize the demo workspace and prove test `share --latest` writes three files into the selected run directory under `share/`: `run-receipt.json`, `run-receipt.md`, and `proof-card.svg`. -Release notes for the current root package: [MartinLoop 0.3.4](./docs/release/OSS-0.3.4-RELEASE-NOTES.md). +Release notes for the current root package: [MartinLoop 0.3.5](./docs/release/OSS-0.3.5-RELEASE-NOTES.md). ## Visual Proof @@ -95,22 +95,42 @@ Ungoverned agents can retry until cost and scope drift. MartinLoop adds budget c MartinLoop governed run compared with an unbounded retry loop +## Proof Receipts + +Proof receipts are local share bundles for governed AI coding runs. They show the task, spend, budget, verifier result, receipt integrity, and any evidence boundary that should not be rounded into confidence. + +This real governed run spent `$0.51` against a `$3.00` budget. The verifier passed and the receipt integrity was signed, but the proof stayed at `EVIDENCE_BOUNDARY` because rollback evidence was not recorded. + +
+ MartinLoop CLI proof receipt for a governed run with spend, budget, verifier, integrity, and evidence boundary +
+ +Generate your own receipt after a governed run: + +```sh +npx -y martin-loop@latest run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test" +npx -y martin-loop@latest runs verify --latest +npx -y martin-loop@latest share --latest +``` + +Example receipt files: [Markdown](./docs/examples/proof-receipts/live-governed-run-receipt.md) and [JSON](./docs/examples/proof-receipts/live-governed-run-receipt.json). + ## Run This Audit Yourself Use this lane from a clean temp directory to verify the public CLI flow exactly as shipped: ```sh -npx -y martin-loop@0.3.4 --version -npx -y martin-loop@0.3.4 start -npx -y martin-loop@0.3.4 demo +npx -y martin-loop@0.3.5 --version +npx -y martin-loop@0.3.5 start +npx -y martin-loop@0.3.5 demo cd martin-loop-demo npm install -npx -y martin-loop@0.3.4 run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test" --json -npx -y martin-loop@0.3.4 dossier --latest --json -npx -y martin-loop@0.3.4 share --latest --json +npx -y martin-loop@0.3.5 run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test" --json +npx -y martin-loop@0.3.5 dossier --latest --json +npx -y martin-loop@0.3.5 share --latest --json ``` -For deterministic installs, pin the package line (`martin-loop@0.3.4`) or use `martin-loop@latest`. Plain `npx martin-loop` can resolve a stale local cache on some machines. +For deterministic installs, pin the package line (`martin-loop@0.3.5`) or use `martin-loop@latest`. Plain `npx martin-loop` can resolve a stale local cache on some machines. Expected share bundle outputs: @@ -229,7 +249,7 @@ npx martin-loop mcp print-config --host gemini --transport stdio --profile full- npx martin-loop mcp print-config --host generic --transport stdio --profile github-review ``` -The root `martin-loop` package and the standalone `@martinloop/mcp` package move on separate version lines. The root package line here is `0.3.4`; the current standalone MCP package is `0.3.1`. +The root `martin-loop` package and the standalone `@martinloop/mcp` package move on separate version lines. The root package line here is `0.3.5`; the current standalone MCP package is `0.3.1`. The public MCP release train labels are: diff --git a/docs/assets/proof-receipt-live-governed.png b/docs/assets/proof-receipt-live-governed.png new file mode 100644 index 0000000..ff7cf6c Binary files /dev/null and b/docs/assets/proof-receipt-live-governed.png differ diff --git a/docs/examples/proof-receipts/live-governed-run-receipt.json b/docs/examples/proof-receipts/live-governed-run-receipt.json new file mode 100644 index 0000000..06a3cea --- /dev/null +++ b/docs/examples/proof-receipts/live-governed-run-receipt.json @@ -0,0 +1,17 @@ +{ + "title": "Martin Loop Proof Receipt", + "loopId": "loop_82emkgkf", + "proofVerdict": "EVIDENCE_BOUNDARY", + "evidenceLine": "Incomplete Martin proof: missing budget, rollback, or verifier evidence.", + "verifier": "passed", + "costSpend": "$0.51", + "budget": "$3.00", + "remainingBudget": "$2.49", + "overspendRatio": "0.17x", + "attempts": "1", + "rollback": "not-recorded", + "receiptIntegrity": "signed", + "verificationSteps": "1", + "runtime": "claude / claude-sonnet-4-6 / agent-cli:claude", + "generatedAt": "2026-06-10T20:01:03.635Z" +} diff --git a/docs/examples/proof-receipts/live-governed-run-receipt.md b/docs/examples/proof-receipts/live-governed-run-receipt.md new file mode 100644 index 0000000..4d3f476 --- /dev/null +++ b/docs/examples/proof-receipts/live-governed-run-receipt.md @@ -0,0 +1,25 @@ +# Martin Loop Proof Receipt + +Incomplete Martin proof: missing budget, rollback, or verifier evidence. + +| Field | Evidence | +| --- | --- | +| Loop ID | loop_82emkgkf | +| Objective | Audit the MartinLoop CLI proof receipt guard for a shareable governed run receipt. | +| Status | exited | +| Lifecycle | budget_exit | +| Verifier | passed | +| Cost / spend | $0.51 | +| Budget | $3.00 | +| Attempts | 1 | +| Rollback | not-recorded | +| Halt reason | Martin exited because the budget governor hit a hard limit. | +| Evidence boundary | Generated from a local Martin Loop run record.; Hosted dashboards and private team telemetry are intentionally excluded from OSS proof cards. | +| Remaining budget | $2.49 | +| Overspend ratio | 0.17x | +| Verification steps | 1 | +| Run mode | not recorded | +| Runtime | claude / claude-sonnet-4-6 / agent-cli:claude | +| Receipt integrity | signed | +| Generated at | 2026-06-10T20:01:03.635Z | + diff --git a/docs/oss/AGENT-RUN-RECEIPTS.md b/docs/oss/AGENT-RUN-RECEIPTS.md index 23a76d1..18e98f4 100644 --- a/docs/oss/AGENT-RUN-RECEIPTS.md +++ b/docs/oss/AGENT-RUN-RECEIPTS.md @@ -85,6 +85,8 @@ Expected bundle output under the selected run directory in `share/`: - `run-receipt.md` (human-readable recap) - `proof-card.svg` (portable visual card) +The proof card is intentionally a terminal-style receipt, not a marketing card. It uses a dark CLI layout, line rules, monospaced evidence rows, green only for verified/pass states, and red only for failed, missing, or boundary states. Do not restyle it into rounded boxes, blue palettes, gradients, certificate layouts, or dashboard cards without an explicit visual review. + 4. Optional custom output directory: ```sh @@ -126,3 +128,13 @@ If exact replay is not possible because the workspace changed, the `warnings` an - usage is presented with provenance (`actual`, `estimated`, or `unavailable`) - verifier failures are explicit and not reinterpreted as success - inspection remains read-only + +## Public proof receipt example + +This repository includes a public-safe proof receipt generated from a real governed run: + +- [visual proof card](../assets/proof-receipt-live-governed.png) +- [Markdown receipt](../examples/proof-receipts/live-governed-run-receipt.md) +- [JSON receipt](../examples/proof-receipts/live-governed-run-receipt.json) + +The example shows a verifier-passed run with signed receipt integrity and an explicit evidence boundary. The boundary is kept visible because rollback evidence was not recorded. diff --git a/docs/oss/AGENT-START-HERE.md b/docs/oss/AGENT-START-HERE.md index fab8bf3..8429a13 100644 --- a/docs/oss/AGENT-START-HERE.md +++ b/docs/oss/AGENT-START-HERE.md @@ -28,6 +28,17 @@ npx martin-loop dossier --latest Expected value: the dossier tells you what happened, what Martin prevented, verifier status, rollback/artifact evidence, clearly labeled token/cost estimates, and the next safe action. +## Proof Receipts + +After a governed run, create a share bundle: + +```sh +npx martin-loop runs verify --latest +npx martin-loop share --latest +``` + +The bundle includes `run-receipt.json`, `run-receipt.md`, and `proof-card.svg`. The proof card should look like a terminal receipt: dark canvas, rows, divider lines, monospaced evidence, green pass states, and red boundary states. Keep uncertainty visible. If rollback, integrity, cost, or verifier evidence is missing, render it as missing instead of turning the run into a success claim. + ## MCP Profile Defaults - `minimal` is the default: `martin_doctor`, `martin_preflight`, `martin_list_runs`, `martin_triage_runs`, and `martin_run_dossier`. diff --git a/docs/release/OSS-0.3.5-RELEASE-NOTES.md b/docs/release/OSS-0.3.5-RELEASE-NOTES.md new file mode 100644 index 0000000..0df56d9 --- /dev/null +++ b/docs/release/OSS-0.3.5-RELEASE-NOTES.md @@ -0,0 +1,29 @@ +# MartinLoop 0.3.5 Proof Receipt Release + +`0.3.5` upgrades MartinLoop share receipts so governed runs produce a sharper CLI-style proof card and clearer public documentation. + +## What Changed + +- Proof cards now render as dark terminal receipts with line rules, monospaced evidence rows, and explicit pass/boundary coloring. +- Share receipts include stronger visible context: task class, spend, budget, remaining budget, overspend ratio, verifier status, integrity state, runtime, and event rail when present in the local run record. +- Missing rollback, verifier, budget, or integrity evidence stays visible as an evidence boundary instead of being softened into a success claim. +- README and agent docs now show how to create and inspect share bundles with `runs verify --latest` and `share --latest`. +- Public tests now block rounded-card, blue-palette, gradient, and typography regressions in proof-card SVG output. + +## Why This Matters + +AI coding work needs evidence that can be checked after the run. A verifier pass is useful, but it is not the whole proof. The receipt should also show what it cost, what evidence exists, and what evidence is missing. + +## Quick Check + +```sh +npx -y martin-loop@0.3.5 run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test" +npx -y martin-loop@0.3.5 runs verify --latest +npx -y martin-loop@0.3.5 share --latest +``` + +Expected share bundle outputs: + +- `share/run-receipt.json` +- `share/run-receipt.md` +- `share/proof-card.svg` diff --git a/docs/release/VERSION-LEDGER.md b/docs/release/VERSION-LEDGER.md index 089e5a9..2f9741c 100644 --- a/docs/release/VERSION-LEDGER.md +++ b/docs/release/VERSION-LEDGER.md @@ -4,16 +4,16 @@ This file is the release source of truth for package/version mapping in this rep ## Root package: `martin-loop` -- live npm dist-tag `latest`: `0.3.4` -- live public GitHub release: `v0.3.4` +- live npm dist-tag `latest`: `0.3.4` before the `0.3.5` proof receipt release publishes +- live public GitHub release: `v0.3.4` before the `v0.3.5` release workflow completes - live public baseline in this train: `0.3.4` - root public baseline: `0.3.4` - releases consumed since the original `0.2.8` launch: - `0.2.9` fixed proof-run classification, Windows `.cmd` resolution, and public provider defaults - `0.2.10` tightened verifier evidence, `--runs-dir` consistency, and public help output - `0.2.11` fixed `runs verify --latest` selector parity in the public CLI -- current in-repo root release line: `0.3.4` for governed integrity hardening across path-policy, selector, and receipt verification surfaces -- next planned root follow-on: `0.3.5` for additional cross-host reliability follow-ups +- current in-repo root release line: `0.3.5` for CLI-style proof receipts and share-bundle documentation +- next planned root follow-on: `0.3.6` for additional cross-host reliability follow-ups ## Standalone package: `@martinloop/mcp` diff --git a/package.json b/package.json index a853545..dc3ffba 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "martin-loop", "private": false, - "version": "0.3.4", + "version": "0.3.5", "type": "module", "description": "Open-source command center for governed AI coding agents with built-in onboarding, hard gates, MCP, and shareable run receipts.", "packageManager": "pnpm@10.33.0", diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts index 8b148d2..495ddd7 100644 --- a/packages/cli/src/index.ts +++ b/packages/cli/src/index.ts @@ -3004,6 +3004,22 @@ function proofCardInputFromLoop(loop: LoopRecord): MartinProofCardInput { ) ? "captured" : "not-recorded"; + const remainingBudget = Math.max(0, loop.budget.maxUsd - loop.cost.actualUsd); + const overspendRatio = + loop.budget.maxUsd > 0 ? `${(loop.cost.actualUsd / loop.budget.maxUsd).toFixed(2)}x` : "unknown"; + const verificationStepCount = loop.events.filter((event) => event.type === "verification.completed").length; + const latestAttempt = loop.attempts.at(-1); + const runtime = latestAttempt + ? `${latestAttempt.adapterId} / ${latestAttempt.model}` + : loop.events + .map((event) => event.payload) + .find((payload) => typeof payload["adapterId"] === "string" || typeof payload["model"] === "string"); + const runtimeLabel = + typeof runtime === "string" + ? runtime + : runtime + ? `${String(runtime["adapterId"] ?? "unknown")} / ${String(runtime["model"] ?? "unknown")}` + : "not recorded"; return { loopId: loop.loopId, @@ -3013,8 +3029,14 @@ function proofCardInputFromLoop(loop: LoopRecord): MartinProofCardInput { verifierStatus: verification.status, costSpend: `$${loop.cost.actualUsd.toFixed(2)}`, budget: `$${loop.budget.maxUsd.toFixed(2)}`, + remainingBudget: `$${remainingBudget.toFixed(2)}`, + overspendRatio, attempts: loop.attempts.length, rollbackStatus, + verificationStepCount, + runMode: loop.task.mutationMode ?? "not recorded", + runtime: runtimeLabel, + timelineEvents: loop.events.map((event) => event.type), haltReason: latestExitReason(loop), evidenceBoundaryNotes: [ "Generated from a local Martin Loop run record.", @@ -3034,8 +3056,14 @@ function defaultChallengeProofCardInput(): MartinProofCardInput { verifierStatus: "passed", costSpend: "$2.30", budget: "$3.00", + remainingBudget: "$0.70", + overspendRatio: "0.77x", attempts: 2, rollbackStatus: "captured", + verificationStepCount: 1, + runMode: "mutating", + runtime: "demo / local-fixture", + timelineEvents: ["run.started", "attempt.started", "verification.completed", "budget.updated", "run.completed"], haltReason: "verifier_passed", evidenceBoundaryNotes: [ "Generated from a local Martin Loop run record.", diff --git a/packages/cli/src/proof-card.ts b/packages/cli/src/proof-card.ts index 42c616d..e7c5b67 100644 --- a/packages/cli/src/proof-card.ts +++ b/packages/cli/src/proof-card.ts @@ -8,10 +8,16 @@ export interface MartinProofCardInput { verifierStatus: string; costSpend: string | number; budget: string | number; + remainingBudget?: string | number; + overspendRatio?: string | number; attempts: string | number; rollbackStatus: string; haltReason: string; evidenceBoundaryNotes: string | readonly string[]; + verificationStepCount?: string | number; + runMode?: string; + runtime?: string; + timelineEvents?: readonly string[]; generatedAt: string; receiptIntegrityState?: ReceiptIntegrityState; } @@ -27,6 +33,9 @@ export interface MartinProofCard { evidenceLine: string; generatedAt: string; completeEvidence: boolean; + proofVerdict: "VERIFIED" | "HALTED" | "FAILED" | "EVIDENCE_BOUNDARY"; + taskLabel: string; + timelineEvents: readonly string[]; } const COMPLETE_EVIDENCE_LINE = "Martin stopped Ralph here."; @@ -43,11 +52,16 @@ const FIELD_LABELS = { verifierStatus: "Verifier", costSpend: "Cost / spend", budget: "Budget", + remainingBudget: "Remaining budget", + overspendRatio: "Overspend ratio", attempts: "Attempts", rollbackStatus: "Rollback", receiptIntegrityState: "Receipt integrity", haltReason: "Halt reason", evidenceBoundaryNotes: "Evidence boundary", + verificationStepCount: "Verification steps", + runMode: "Run mode", + runtime: "Runtime", generatedAt: "Generated at" } as const; @@ -66,6 +80,8 @@ export function buildMartinProofCard(input: MartinProofCardInput): MartinProofCa { label: FIELD_LABELS.verifierStatus, value: sanitizeText(input.verifierStatus) }, { label: FIELD_LABELS.costSpend, value: sanitizeText(input.costSpend) }, { label: FIELD_LABELS.budget, value: sanitizeText(input.budget) }, + { label: FIELD_LABELS.remainingBudget, value: sanitizeOptionalText(input.remainingBudget) }, + { label: FIELD_LABELS.overspendRatio, value: sanitizeOptionalText(input.overspendRatio) }, { label: FIELD_LABELS.attempts, value: sanitizeText(input.attempts) }, { label: FIELD_LABELS.rollbackStatus, value: sanitizeText(input.rollbackStatus) }, ...(input.receiptIntegrityState @@ -76,6 +92,12 @@ export function buildMartinProofCard(input: MartinProofCardInput): MartinProofCa } ] : []), + { + label: FIELD_LABELS.verificationStepCount, + value: sanitizeOptionalText(input.verificationStepCount) + }, + { label: FIELD_LABELS.runMode, value: sanitizeOptionalText(input.runMode) }, + { label: FIELD_LABELS.runtime, value: sanitizeOptionalText(input.runtime) }, { label: FIELD_LABELS.haltReason, value: sanitizeText(input.haltReason) }, { label: FIELD_LABELS.evidenceBoundaryNotes, @@ -101,11 +123,20 @@ export function buildMartinProofCard(input: MartinProofCardInput): MartinProofCa : INCOMPLETE_EVIDENCE_LINE; return { - title: "Martin Loop Proof Card", + title: "Martin Loop Proof Receipt", fields, evidenceLine, generatedAt, - completeEvidence + completeEvidence, + proofVerdict: deriveProofVerdict({ + completeEvidence, + status: input.status, + lifecycle: input.lifecycle, + verifierStatus: input.verifierStatus, + receiptIntegrityState: input.receiptIntegrityState + }), + taskLabel: deriveTaskLabel(input.objective), + timelineEvents: normalizeTimelineEvents(input.timelineEvents) }; } @@ -127,40 +158,211 @@ export function renderMartinProofCardMarkdown(card: MartinProofCard): string { } export function renderMartinProofCardSvg(card: MartinProofCard): string { - const width = 760; - const rowHeight = 28; - const headerHeight = 92; - const footerHeight = 38; - const height = headerHeight + card.fields.length * rowHeight + footerHeight; - const rows = card.fields - .map((field, index) => { - const y = headerHeight + index * rowHeight; - const fill = index % 2 === 0 ? "#f8fafc" : "#ffffff"; - - return [ - ``, - `${escapeSvg(field.label)}`, - `${escapeSvg(field.value)}` - ].join(""); - }) - .join(""); + const width = 1200; + const height = 675; + const margin = 46; + const accent = verdictColor(card.proofVerdict); + const field = (label: string) => getFieldValue(card.fields, label); + const metrics: readonly (readonly [string, string])[] = [ + ["cost_usd", normalizeMoneyValue(field("Cost / spend"))], + ["budget_usd", normalizeMoneyValue(field("Budget"))], + ["remaining_usd", normalizeMoneyValue(field("Remaining budget"))], + ["overspend_ratio", normalizeCliValue(field("Overspend ratio"))], + ["attempts", normalizeCliValue(field("Attempts"))], + ["rollback", normalizeCliValue(field("Rollback"))], + ["receipt_integrity", normalizeCliValue(field("Receipt integrity"))] + ]; + const meta: readonly (readonly [string, string])[] = [ + ["task", card.taskLabel], + ["run_mode", normalizeCliValue(field("Run mode"))], + ["runtime", normalizeCliValue(field("Runtime"))], + [ + "verifier", + `${normalizeCliValue(field("Verifier"))} / steps:${normalizeCliValue(field("Verification steps"))}` + ], + ["halt_reason", normalizeCliValue(field("Halt reason"))] + ]; + const boundary = normalizeBoundaryLine(field("Evidence boundary")); + const command = `$ martin runs verify --loop-id ${field("Loop ID")}`; return [ ``, - '', - '', - `${escapeSvg(card.title)}`, - `${escapeSvg(card.evidenceLine)}`, - rows, - `Generated ${escapeSvg(card.generatedAt)}`, + "", + ``, + "", + '', + ``, + `MARTIN LOOP :: PROOF RECEIPT`, + `[${escapeSvg(card.proofVerdict)}]`, + ``, + `${escapeSvg(command)}`, + `result: ${escapeSvg(normalizeCliValue(field("Verifier")))} | proof: ${escapeSvg(card.proofVerdict.toLowerCase())}`, + `${escapeSvg(truncateText(card.evidenceLine, 118))}`, + ``, + `METRICS`, + ...renderCliRows(metrics, margin, 310, 31, accent, 38), + '', + 'RUN CONTEXT', + ...renderCliRows(meta, 660, 310, 31, accent, 31), + 'EVENT RAIL', + renderEventRail(card.timelineEvents, 660, 505, accent), + ``, + `BOUNDARY`, + `${escapeSvg(truncateText(boundary, 86))}`, + `generated_at=${escapeSvg(card.generatedAt)}`, + `offline-verifiable local run evidence`, "" ].join(""); } +function getFieldValue(fields: readonly MartinProofCardField[], label: string): string { + return fields.find((field) => field.label === label)?.value ?? "unknown"; +} + +function renderCliRows( + rows: readonly (readonly [string, string])[], + x: number, + startY: number, + rowGap: number, + accent: string, + maxValueLength: number +): string[] { + return rows.map(([label, value], index) => { + const y = startY + index * rowGap; + const color = valueColor(value, accent); + const dots = ".".repeat(Math.max(3, 24 - label.length)); + return `${escapeSvg(label)} ${dots} ${escapeSvg(truncateText(value, maxValueLength))}`; + }); +} + +function renderEventRail(events: readonly string[], x: number, y: number, accent: string): string { + const visible = events.slice(0, 5); + const hasFailure = visible.some((event) => /fail|reject|missing|breach|error|tamper/iu.test(event)); + const lineColor = hasFailure ? "#d35f5f" : accent; + const rail = visible.map((event) => compactEventName(event)).join(" -> "); + + return [ + ``, + ``, + `${escapeSvg(truncateText(rail, 70))}` + ].join(""); +} + +function svgStyle(): string { + return [ + ".title{font-family:'SF Pro Display','Geist','Satoshi',system-ui,sans-serif;font-size:28px;font-weight:680;letter-spacing:.08em;fill:#f3f3ee}", + ".verdict{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:19px;font-weight:760;letter-spacing:.08em}", + ".prompt{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:19px;font-weight:650;fill:#f3f3ee}", + ".mono{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:17px;font-weight:560;letter-spacing:0}", + ".section{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:13px;font-weight:720;letter-spacing:.14em;fill:#8b8b84}", + ".muted{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:16px;font-weight:520;fill:#aaa9a0}", + ".footer{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:13px;font-weight:520;fill:#8b8b84}", + ".tiny{font-family:'SF Mono','Geist Mono','JetBrains Mono',ui-monospace,monospace;font-size:11px;font-weight:540;letter-spacing:0}" + ].join(""); +} + function sanitizeText(value: string | number): string { return redactAbsolutePaths(String(value)); } +function sanitizeOptionalText(value: string | number | undefined): string { + if (value === undefined || value === null || String(value).trim().length === 0) { + return "not recorded"; + } + return sanitizeText(value); +} + +function deriveProofVerdict(input: { + completeEvidence: boolean; + status: string | number; + lifecycle: string | number; + verifierStatus: string | number; + receiptIntegrityState: ReceiptIntegrityState | undefined; +}): MartinProofCard["proofVerdict"] { + const joined = + `${input.status} ${input.lifecycle} ${input.verifierStatus} ${input.receiptIntegrityState ?? ""}`.toLowerCase(); + if (!input.completeEvidence || /\b(unsigned|tamper_detected)\b/u.test(joined)) { + return "EVIDENCE_BOUNDARY"; + } + if (/\b(failed|failure|error)\b/u.test(joined)) return "FAILED"; + if (/\b(halted|halt|budget_exit|stuck_exit|diminishing_returns)\b/u.test(joined)) { + return "HALTED"; + } + return "VERIFIED"; +} + +function deriveTaskLabel(objective: string | number): string { + const normalized = sanitizeText(objective).toLowerCase(); + const candidates: readonly [RegExp, string][] = [ + [/receipt|integrity|sign|signature|tamper|hash/u, "receipt-integrity verification"], + [/budget|spend|cost|overspend|cap/u, "budget-governed agent run"], + [/verif|test|assert|check/u, "verifier-backed repair"], + [/policy|preflight|allow|deny|path/u, "policy preflight validation"], + [/mcp|connect|adapter|provider/u, "agent runtime connectivity"], + [/redact|sanitize|secret|privacy/u, "redaction safety check"], + [/loopbench|benchmark|eval/u, "benchmark integrity run"] + ]; + return candidates.find(([pattern]) => pattern.test(normalized))?.[1] ?? "governed agent task"; +} + +function normalizeTimelineEvents(events: readonly string[] | undefined): readonly string[] { + const cleaned = (events ?? []) + .map((event) => sanitizeText(event).trim()) + .filter((event) => event.length > 0); + return cleaned.length > 0 ? cleaned.slice(0, 6) : ["run.started", "run.completed"]; +} + +function verdictColor(verdict: MartinProofCard["proofVerdict"]): string { + return verdict === "FAILED" || verdict === "EVIDENCE_BOUNDARY" ? "#d35f5f" : "#72b37e"; +} + +function valueColor(value: string, accent: string): string { + if (/\b(pass|passed|verified|captured|signed|complete|completed)\b/iu.test(value)) { + return "#72b37e"; + } + if (/\b(fail|failed|missing|unavailable|not recorded|not-recorded|boundary|tamper)\b/iu.test(value)) { + return "#d35f5f"; + } + return accent === "#d35f5f" ? "#e8e8e3" : "#d6d6ce"; +} + +function normalizeMoneyValue(value: string): string { + const normalized = normalizeCliValue(value).replace(/^\$/u, ""); + return normalized === "unknown" || normalized === "not recorded" ? normalized : normalized; +} + +function normalizeCliValue(value: string): string { + const normalized = value.trim(); + if (!normalized || normalized === "missing" || normalized === "n/a") return "not recorded"; + return normalized.replace(/\s+/gu, " "); +} + +function normalizeBoundaryLine(value: string): string { + const normalized = normalizeCliValue(value) + .replace(/Generated from a local Martin Loop run record\.;?/iu, "local run record only;") + .replace( + /Hosted dashboards and private team telemetry are intentionally excluded from OSS proof cards\./iu, + "private telemetry excluded" + ); + return normalized === "not recorded" ? "local run record only; private telemetry excluded" : normalized; +} + +function compactEventName(event: string): string { + const replacements: Record = { + "run.started": "run.start", + "attempt.started": "attempt.start", + "attempt.completed": "attempt.done", + "verification.completed": "verify.done", + "budget.updated": "budget.update", + "run.completed": "run.done" + }; + return replacements[event] ?? event.replace(/completed/gu, "done").replace(/started/gu, "start").slice(0, 18); +} + +function truncateText(value: string, maxLength: number): string { + return value.length <= maxLength ? value : `${value.slice(0, Math.max(1, maxLength - 1))}...`; +} + function hasEvidence(value: string | number): boolean { const normalized = String(value).trim().toLowerCase(); diff --git a/packages/cli/tests/operator-commands.test.ts b/packages/cli/tests/operator-commands.test.ts index 0b1c651..94aaf27 100644 --- a/packages/cli/tests/operator-commands.test.ts +++ b/packages/cli/tests/operator-commands.test.ts @@ -848,7 +848,7 @@ describe("share command", () => { expect(receiptMarkdown).toContain("# Martin Loop Share Receipt"); expect(receiptMarkdown).toContain("Receipt integrity unavailable: Martin proof is not yet trustworthy."); expect(receiptMarkdown).not.toContain(runsRoot); - expect(proofCardSvg).toContain("Martin Loop Proof Card"); + expect(proofCardSvg).toContain("MARTIN LOOP :: PROOF RECEIPT"); expect(proofCardSvg).not.toContain(runsRoot); }); }); diff --git a/packages/cli/tests/proof-card.test.ts b/packages/cli/tests/proof-card.test.ts index 1685391..dbd397c 100644 --- a/packages/cli/tests/proof-card.test.ts +++ b/packages/cli/tests/proof-card.test.ts @@ -15,8 +15,14 @@ const completeInput = (): MartinProofCardInput => ({ verifierStatus: "passed", costSpend: "$18.42", budget: "$20.00", + remainingBudget: "$1.58", + overspendRatio: "0.92x", attempts: 3, rollbackStatus: "rollback-ready", + verificationStepCount: 2, + runMode: "mutating", + runtime: "codex-cli / gpt-5-codex", + timelineEvents: ["run.started", "attempt.started", "verification.completed", "budget.updated", "run.completed"], haltReason: "budget guard reached after verifier pass", evidenceBoundaryNotes: [ "Artifacts live at C:\\workspace\\secret workspace\\runs\\loop_viral_001\\ledger.jsonl", @@ -26,11 +32,14 @@ const completeInput = (): MartinProofCardInput => ({ }); describe("Martin proof cards", () => { - it("renders complete evidence with the viral stop line", () => { + it("renders complete evidence with the CLI proof receipt title and stop line", () => { const card = buildMartinProofCard(completeInput()); + const svg = renderMartinProofCardSvg(card); expect(renderMartinProofCardMarkdown(card)).toContain("Martin stopped Ralph here."); - expect(renderMartinProofCardSvg(card)).toContain("Martin stopped Ralph here."); + expect(svg).toContain("MARTIN LOOP :: PROOF RECEIPT"); + expect(svg).toContain("Martin stopped Ralph here."); + expect(svg).toContain("$ martin runs verify --loop-id loop_viral_001"); }); it("renders an honest incomplete-evidence line when proof is missing", () => { @@ -68,6 +77,50 @@ describe("Martin proof cards", () => { expect(rendered).toContain("[redacted-path]/verifier.txt"); }); + it("keeps the proof card in the locked terminal visual language", () => { + const svg = renderMartinProofCardSvg(buildMartinProofCard(completeInput())); + + expect(svg).toContain("METRICS"); + expect(svg).toContain("RUN CONTEXT"); + expect(svg).toContain("EVENT RAIL"); + expect(svg).toContain("BOUNDARY"); + expect(svg).not.toContain("rx="); + expect(svg).not.toContain("linearGradient"); + expect(svg).not.toContain("radialGradient"); + expect(svg).not.toContain("Inter"); + expect(svg).not.toMatch(/#(?:0f172a|1d4ed8|2563eb|3178c6|3b82f6|60a5fa|93c5fd|bfdbfe|dbeafe)/iu); + }); + + it("renders unavailable fields as not recorded without inflating proof state", () => { + const card = buildMartinProofCard({ + ...completeInput(), + remainingBudget: undefined, + overspendRatio: undefined, + verificationStepCount: undefined, + runMode: undefined, + runtime: undefined, + rollbackStatus: "not-recorded" + }); + const svg = renderMartinProofCardSvg(card); + + expect(card.proofVerdict).toBe("EVIDENCE_BOUNDARY"); + expect(svg).toContain("not recorded"); + expect(svg).toContain("[EVIDENCE_BOUNDARY]"); + }); + + it("uses restrained red and green semantics for boundary and verified states", () => { + const verifiedSvg = renderMartinProofCardSvg(buildMartinProofCard(completeInput())); + const boundarySvg = renderMartinProofCardSvg( + buildMartinProofCard({ + ...completeInput(), + rollbackStatus: "not-recorded" + }) + ); + + expect(verifiedSvg).toContain("#72b37e"); + expect(boundarySvg).toContain("#d35f5f"); + }); + it("renders deterministic Markdown and SVG for the same card", () => { const card = buildMartinProofCard(completeInput()); @@ -87,8 +140,7 @@ describe("Martin proof cards", () => { expect(markdown).toContain("Escape <script>alert('x')</script> & keep \\| pipes"); expect(markdown).toContain("Verifier said: <ok> & rollback \\| stable"); - expect(svg).toContain("Escape <script>alert('x')</script> & keep | pipes"); - expect(svg).toContain("Verifier said: <ok> & rollback | stable"); + expect(svg).toContain("Verifier said: <ok> & rollback..."); expect(svg).not.toContain("