-
Notifications
You must be signed in to change notification settings - Fork 4
Ship CLI proof receipts #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| { | ||
| "title": "Martin Loop Proof Receipt", | ||
| "loopId": "loop_82emkgkf", | ||
| "proofVerdict": "EVIDENCE_BOUNDARY", | ||
| "evidenceLine": "Incomplete Martin proof: missing budget, rollback, or verifier evidence.", | ||
| "verifier": "passed", | ||
| "costSpend": "$0.51", | ||
| "budget": "$3.00", | ||
| "remainingBudget": "$2.49", | ||
| "overspendRatio": "0.17x", | ||
| "attempts": "1", | ||
| "rollback": "not-recorded", | ||
| "receiptIntegrity": "signed", | ||
| "verificationSteps": "1", | ||
| "runtime": "claude / claude-sonnet-4-6 / agent-cli:claude", | ||
| "generatedAt": "2026-06-10T20:01:03.635Z" | ||
| } | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,25 @@ | ||||||
| # Martin Loop Proof Receipt | ||||||
|
|
||||||
| Incomplete Martin proof: missing budget, rollback, or verifier evidence. | ||||||
|
|
||||||
| | Field | Evidence | | ||||||
| | --- | --- | | ||||||
| | Loop ID | loop_82emkgkf | | ||||||
| | Objective | Audit the MartinLoop CLI proof receipt guard for a shareable governed run receipt. | | ||||||
| | Status | exited | | ||||||
| | Lifecycle | budget_exit | | ||||||
| | Verifier | passed | | ||||||
| | Cost / spend | $0.51 | | ||||||
| | Budget | $3.00 | | ||||||
| | Attempts | 1 | | ||||||
| | Rollback | not-recorded | | ||||||
| | Halt reason | Martin exited because the budget governor hit a hard limit. | | ||||||
| | Evidence boundary | Generated from a local Martin Loop run record.; Hosted dashboards and private team telemetry are intentionally excluded from OSS proof cards. | | ||||||
| | Remaining budget | $2.49 | | ||||||
| | Overspend ratio | 0.17x | | ||||||
| | Verification steps | 1 | | ||||||
| | Run mode | not recorded | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inconsistent value format with JSON example. Line 21 shows "not recorded" (space-separated) while the JSON example uses "not-recorded" (hyphenated) for the rollback field. Ensure consistent formatting across example formats. 🔧 Proposed fix for consistency-| Run mode | not recorded |
+| Run mode | not-recorded |📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| | Runtime | claude / claude-sonnet-4-6 / agent-cli:claude | | ||||||
| | Receipt integrity | signed | | ||||||
| | Generated at | 2026-06-10T20:01:03.635Z | | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # MartinLoop 0.3.5 Proof Receipt Release | ||
|
|
||
| `0.3.5` upgrades MartinLoop share receipts so governed runs produce a sharper CLI-style proof card and clearer public documentation. | ||
|
|
||
| ## What Changed | ||
|
|
||
| - Proof cards now render as dark terminal receipts with line rules, monospaced evidence rows, and explicit pass/boundary coloring. | ||
| - Share receipts include stronger visible context: task class, spend, budget, remaining budget, overspend ratio, verifier status, integrity state, runtime, and event rail when present in the local run record. | ||
| - Missing rollback, verifier, budget, or integrity evidence stays visible as an evidence boundary instead of being softened into a success claim. | ||
| - README and agent docs now show how to create and inspect share bundles with `runs verify --latest` and `share --latest`. | ||
| - Public tests now block rounded-card, blue-palette, gradient, and typography regressions in proof-card SVG output. | ||
|
|
||
| ## Why This Matters | ||
|
|
||
| AI coding work needs evidence that can be checked after the run. A verifier pass is useful, but it is not the whole proof. The receipt should also show what it cost, what evidence exists, and what evidence is missing. | ||
|
|
||
| ## Quick Check | ||
|
|
||
| ```sh | ||
| npx -y martin-loop@0.3.5 run "Summarize the demo workspace and prove tests still pass" --proof --verify "npm test" | ||
| npx -y martin-loop@0.3.5 runs verify --latest | ||
| npx -y martin-loop@0.3.5 share --latest | ||
| ``` | ||
|
|
||
| Expected share bundle outputs: | ||
|
|
||
| - `share/run-receipt.json` | ||
| - `share/run-receipt.md` | ||
| - `share/proof-card.svg` |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3004,6 +3004,22 @@ function proofCardInputFromLoop(loop: LoopRecord): MartinProofCardInput { | |||||||||||||
| ) | ||||||||||||||
| ? "captured" | ||||||||||||||
| : "not-recorded"; | ||||||||||||||
| const remainingBudget = Math.max(0, loop.budget.maxUsd - loop.cost.actualUsd); | ||||||||||||||
| const overspendRatio = | ||||||||||||||
| loop.budget.maxUsd > 0 ? `${(loop.cost.actualUsd / loop.budget.maxUsd).toFixed(2)}x` : "unknown"; | ||||||||||||||
|
Comment on lines
+3007
to
+3009
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rename or recompute the new "overspend ratio" metric. This is Also applies to: 3059-3060 🤖 Prompt for AI Agents |
||||||||||||||
| const verificationStepCount = loop.events.filter((event) => event.type === "verification.completed").length; | ||||||||||||||
| const latestAttempt = loop.attempts.at(-1); | ||||||||||||||
| const runtime = latestAttempt | ||||||||||||||
| ? `${latestAttempt.adapterId} / ${latestAttempt.model}` | ||||||||||||||
| : loop.events | ||||||||||||||
| .map((event) => event.payload) | ||||||||||||||
| .find((payload) => typeof payload["adapterId"] === "string" || typeof payload["model"] === "string"); | ||||||||||||||
|
Comment on lines
+3014
to
+3016
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If
Suggested change
|
||||||||||||||
| const runtimeLabel = | ||||||||||||||
| typeof runtime === "string" | ||||||||||||||
| ? runtime | ||||||||||||||
| : runtime | ||||||||||||||
| ? `${String(runtime["adapterId"] ?? "unknown")} / ${String(runtime["model"] ?? "unknown")}` | ||||||||||||||
| : "not recorded"; | ||||||||||||||
|
|
||||||||||||||
| return { | ||||||||||||||
| loopId: loop.loopId, | ||||||||||||||
|
|
@@ -3013,8 +3029,14 @@ function proofCardInputFromLoop(loop: LoopRecord): MartinProofCardInput { | |||||||||||||
| verifierStatus: verification.status, | ||||||||||||||
| costSpend: `$${loop.cost.actualUsd.toFixed(2)}`, | ||||||||||||||
| budget: `$${loop.budget.maxUsd.toFixed(2)}`, | ||||||||||||||
| remainingBudget: `$${remainingBudget.toFixed(2)}`, | ||||||||||||||
| overspendRatio, | ||||||||||||||
| attempts: loop.attempts.length, | ||||||||||||||
| rollbackStatus, | ||||||||||||||
| verificationStepCount, | ||||||||||||||
| runMode: loop.task.mutationMode ?? "not recorded", | ||||||||||||||
| runtime: runtimeLabel, | ||||||||||||||
| timelineEvents: loop.events.map((event) => event.type), | ||||||||||||||
| haltReason: latestExitReason(loop), | ||||||||||||||
| evidenceBoundaryNotes: [ | ||||||||||||||
| "Generated from a local Martin Loop run record.", | ||||||||||||||
|
|
@@ -3034,8 +3056,14 @@ function defaultChallengeProofCardInput(): MartinProofCardInput { | |||||||||||||
| verifierStatus: "passed", | ||||||||||||||
| costSpend: "$2.30", | ||||||||||||||
| budget: "$3.00", | ||||||||||||||
| remainingBudget: "$0.70", | ||||||||||||||
| overspendRatio: "0.77x", | ||||||||||||||
| attempts: 2, | ||||||||||||||
| rollbackStatus: "captured", | ||||||||||||||
| verificationStepCount: 1, | ||||||||||||||
| runMode: "mutating", | ||||||||||||||
| runtime: "demo / local-fixture", | ||||||||||||||
| timelineEvents: ["run.started", "attempt.started", "verification.completed", "budget.updated", "run.completed"], | ||||||||||||||
| haltReason: "verifier_passed", | ||||||||||||||
| evidenceBoundaryNotes: [ | ||||||||||||||
| "Generated from a local Martin Loop run record.", | ||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 787
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 1314
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 189
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 28217
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 11988
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 1985
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 3645
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 1617
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 1645
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 11381
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 6023
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 10223
🏁 Script executed:
Repository: Keesan12/martin-loop
Length of output: 13288
Fix
live-governed-run-receipt.jsonfield names/value to match the proof-card schemadocs/examples/proof-receipts/live-governed-run-receipt.jsonusesreceiptIntegrity,verificationSteps,rollback, andverifier, but the proof-card input schema usesreceiptIntegrityState,verificationStepCount,rollbackStatus, andverifierStatus.receiptIntegrityto"signed", butReceiptIntegrityStateis only"verified" | "unsigned" | "tamper_detected".docs/examples/proof-receipts/live-governed-run-receipt.md(Objective, Status, Lifecycle, Halt reason, Evidence boundary, Run mode), so the JSON doesn’t mirror the documented proof card.🤖 Prompt for AI Agents